论文标题

可可:可控的反事实,用于评估对话状态跟踪器

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

论文作者

Li, Shiyang, Yavuz, Semih, Hashimoto, Kazuma, Li, Jia, Niu, Tong, Rajani, Nazneen, Yan, Xifeng, Zhou, Yingbo, Xiong, Caiming

论文摘要

对话状态跟踪器在基准数据集上取得了重大进展,但是他们对超出举行对话以外的新颖和现实情景的概括能力尚不清楚。我们提出可控的反事实(COCO)来弥合这一差距,并评估对话状态跟踪(DST)模型,即如果用户对对话流的响应有所不同,但仍然始终如一地响应对话流程,该系统是否会成功解决该请求?可可利用转弯级信念作为反事实的有条件,以两个步骤产生新颖的对话场景:(i)通过掉落并添加插槽,然后替换插槽值,(ii)反事实对话产生(i)并保持对话流程的一致性。评估具有可可生成的反事实的多WOZ数据集上的最新DST模型,导致绝对联合目标准确性的显着性能下降高达30.8%(从49.4%到18.6%)。相比之下,诸如释义之类的广泛使用的技术最多只会影响准确性。人类评估表明,可可生成的对话完美地反映了基本用户的目标,其准确性超过95%,并且与原始对话一样像人类,进一步增强了其可靠性和承诺,并承诺被作为DST模型稳健性评估的一部分。

Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? CoCo leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with CoCo-generated counterfactuals results in a significant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy. In comparison, widely used techniques like paraphrasing only affect the accuracy by at most 2%. Human evaluations show that COCO-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations, further strengthening its reliability and promise to be adopted as part of the robustness evaluation of DST models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源