Glow：全球加权自我注意力网络用于网络搜索

论文标题

Glow：全球加权自我注意力网络用于网络搜索

GLOW : Global Weighted Self-Attention Network for Web Search

论文作者

Shan, Xuan, Liu, Chuanjie, Xia, Yiqian, Chen, Qi, Zhang, Yusi, Ding, Kaize, Liang, Yaobo, Luo, Angen, Luo, Yuxiang

论文摘要

深度匹配的模型旨在通过将查询和文档映射到第一阶段检索中的语义向量，以促进搜索引擎检索更相关的文档。当利用伯特作为深度匹配模型时，两个单词的注意力得分仅基于局部上下文化的单词嵌入。它缺乏以前的全球知识来区分不同单词的重要性，这被证明在信息检索任务中起着至关重要的作用。除此之外，伯特仅在子字代币上进行关注，从而削弱了整个单词注意表示。我们提出了一个新型的全球加权自我注意力（GLOW）网络，用于Web文档搜索。 Glow将全局语料库统计数据融合到深度匹配模型中。通过将先前的权重从BM25（例如BM25）等全球信息中添加到引起注意力中，Glow成功地与查询矩阵$ Q $和键矩阵$ k $共同学习了加权注意分数。我们还提出了一个有效的整个单词权重共享解决方案，以将先前的整个单词知识带入子字级别的关注。它有助于变形金刚学习整个单词级别的关注。为了使我们的模型适用于复杂的Web搜索方案，我们介绍了合并的字段表示形式，以容纳具有多个字段的文档，即使有可变数量的实例。我们证明，在查询和文档中捕获局部和语义表示的发光更有效。对公共数据集进行的内在评估和实验表明，发光是文件检索任务的一般框架。它大大优于BERT和其他竞争基线，同时保留了与BERT相同的模型复杂性。

Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval. When leveraging BERT as the deep matching model, the attention score across two words are solely built upon local contextualized word embeddings. It lacks prior global knowledge to distinguish the importance of different words, which has been proved to play a critical role in information retrieval tasks. In addition to this, BERT only performs attention across sub-words tokens which weakens whole word attention representation. We propose a novel Global Weighted Self-Attention (GLOW) network for web document search. GLOW fuses global corpus statistics into the deep matching model. By adding prior weights into attention generation from global information, like BM25, GLOW successfully learns weighted attention scores jointly with query matrix $Q$ and key matrix $K$. We also present an efficient whole word weight sharing solution to bring prior whole word knowledge into sub-words level attention. It aids Transformer to learn whole word level attention. To make our models applicable to complicated web search scenarios, we introduce combined fields representation to accommodate documents with multiple fields even with variable number of instances. We demonstrate GLOW is more efficient to capture the topical and semantic representation both in queries and documents. Intrinsic evaluation and experiments conducted on public data sets reveal GLOW to be a general framework for document retrieve task. It significantly outperforms BERT and other competitive baselines by a large margin while retaining the same model complexity with BERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题