论文标题
早期比晚期更好:将主题与单词嵌入神经问题释义术语融合
Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification
论文作者
论文摘要
问题释义识别是社区问题回答(CQA)的关键任务(CQA),以确定是否以前提出了传入的问题。许多当前的模型使用单词嵌入来识别重复的问题,但是在功能设计的系统中使用主题模型也表明它们也可以有助于此任务。因此,我们提出了两种将主题与单词嵌入(早期与晚期融合)合并的方法,以进行问题释义识别。我们的结果表明,我们的系统在多个CQA数据集上优于神经基线,而消融研究突出了主题的重要性,尤其是早期的主题融合在我们的体系结构中。
Question paraphrase identification is a key task in Community Question Answering (CQA) to determine if an incoming question has been previously asked. Many current models use word embeddings to identify duplicate questions, but the use of topic models in feature-engineered systems suggests that they can be helpful for this task, too. We therefore propose two ways of merging topics with word embeddings (early vs. late fusion) in a new neural architecture for question paraphrase identification. Our results show that our system outperforms neural baselines on multiple CQA datasets, while an ablation study highlights the importance of topics and especially early topic-embedding fusion in our architecture.