论文标题
tableqna:用网络表回答列表意见查询
TableQnA: Answering List Intent Queries With Web Tables
论文作者
论文摘要
网络包含大量的HTML表。它们可用于为许多Web查询提供直接答案。我们专注于用这些表来回答两个类别的查询:那些寻求实体清单(例如,“加利福尼亚州的城市”)和寻求最高级实体的人(例如,加利福尼亚州最大的城市')。主要的挑战是在覆盖范围内实现高精度。现有方法训练机器学习模型,以从候选人中选择答案;他们依靠查询和表格内容之间的文本匹配功能以及捕获表质量/重要性的功能。仅这些功能就不足以实现上述目标。我们的主要见解是,我们可以通过(i)从上述查询类的查询中首先提取意图(结构化信息)提高精度,然后(ii)然后在提取的意图和候选人之间执行结构感知匹配(而不是仅仅是文本匹配)以选择答案。我们将(i)建模为序列标记任务。我们利用具有单词嵌入的最新深度神经网络模型。该模型需要大规模培训数据,这是通过手动标签获得的昂贵的;因此,我们开发了一种新颖的方法来自动生成训练数据。对于(ii),我们开发了新颖的功能来计算结构吸引匹配并训练机器学习模型。我们对现实Web搜索查询的实验表明,与基线方法相比,(ii)我们的表答案选择器明显优于最先进的基线方法,我们的意图提取器和最高意图查询的精度和覆盖率明显更高。自2016年以来,Microsoft的Bing搜索引擎已将这项技术用于生产。
The web contains a vast corpus of HTML tables. They can be used to provide direct answers to many web queries. We focus on answering two classes of queries with those tables: those seeking lists of entities (e.g., `cities in california') and those seeking superlative entities (e.g., `largest city in california'). The main challenge is to achieve high precision with significant coverage. Existing approaches train machine learning models to select the answer from the candidates; they rely on textual match features between the query and the content of the table along with features capturing table quality/importance. These features alone are inadequate for achieving the above goals. Our main insight is that we can improve precision by (i) first extracting intent (structured information) from the query for the above query classes and (ii) then performing structure-aware matching (instead of just textual matching) between the extracted intent and the candidates to select the answer. We model (i) as a sequence tagging task. We leverage state-of-the-art deep neural network models with word embeddings. The model requires large scale training data which is expensive to obtain via manual labeling; we therefore develop a novel method to automatically generate the training data. For (ii), we develop novel features to compute structure-aware match and train a machine learning model. Our experiments on real-life web search queries show that (i) our intent extractor for list and superlative intent queries has significantly higher precision and coverage compared with baseline approaches and (ii) our table answer selector significantly outperforms the state-of-the-art baseline approach. This technology has been used in production by Microsoft's Bing search engine since 2016.