论文标题
从自然出现的业务对话中提取类似的问题
Extracting Similar Questions From Naturally-occurring Business Conversations
论文作者
论文摘要
在许多自然语言处理系统中,预先训练的上下文化嵌入模型(例如BERT)是标准的构件。我们证明,某些现成的上下文化嵌入模型在嵌入空间中分布狭窄,因此在识别现实英语英语业务对话中识别语义上类似问题的任务方面表现较差。我们描述了一种使用适当调整的表示形式和一小部分示例来将可视化的企业用户的问题分组的方法,可用于数据探索或员工教练。
Pre-trained contextualized embedding models such as BERT are a standard building block in many natural language processing systems. We demonstrate that the sentence-level representations produced by some off-the-shelf contextualized embedding models have a narrow distribution in the embedding space, and thus perform poorly for the task of identifying semantically similar questions in real-world English business conversations. We describe a method that uses appropriately tuned representations and a small set of exemplars to group questions of interest to business users in a visualization that can be used for data exploration or employee coaching.