论文标题
Multiwoz 2.2:带有其他注释校正和状态跟踪基线的对话数据集
MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
论文作者
论文摘要
Multiwoz是一种著名的面向任务的对话数据集,其中包含超过10个跨越8个域的注释对话。它被广泛用作对话状态跟踪的基准。但是,最近的作品报道了对话状态注释中存在大量噪音。 Multiwoz 2.1识别并修复了许多错误的注释和用户话语,从而改善了该数据集的版本。这项工作介绍了Multiwoz 2.2,这是该数据集的另一个改进版本。首先,我们确定并修复了对话状态注释错误,其中17.3%的话语是Multiwoz 2.1。其次,我们通过不允许插槽的词汇量(例如餐厅名称,预订时间)来重新定义本体。此外,我们为这些插槽引入了插槽跨度注释,以在最近的模型中标准化它们,该模型以前使用自定义的字符串匹配启发式方法来生成它们。我们还基于校正数据集上的一些最先进的对话状态跟踪模型,以促进将来的工作比较。最后,我们讨论了对话数据收集的最佳实践,以帮助避免注释错误。
MultiWOZ is a well-known task-oriented dialogue dataset containing over 10,000 annotated dialogues spanning 8 domains. It is extensively used as a benchmark for dialogue state tracking. However, recent works have reported presence of substantial noise in the dialogue state annotations. MultiWOZ 2.1 identified and fixed many of these erroneous annotations and user utterances, resulting in an improved version of this dataset. This work introduces MultiWOZ 2.2, which is a yet another improved version of this dataset. Firstly, we identify and fix dialogue state annotation errors across 17.3% of the utterances on top of MultiWOZ 2.1. Secondly, we redefine the ontology by disallowing vocabularies of slots with a large number of possible values (e.g., restaurant name, time of booking). In addition, we introduce slot span annotations for these slots to standardize them across recent models, which previously used custom string matching heuristics to generate them. We also benchmark a few state of the art dialogue state tracking models on the corrected dataset to facilitate comparison for future work. In the end, we discuss best practices for dialogue data collection that can help avoid annotation errors.