论文标题

广义的球形文本嵌入

Generalised Spherical Text Embedding

论文作者

Banerjee, Souvik, Mishra, Bamdev, Jawanpuria, Pratik, Shrivastava, Manish

论文摘要

本文旨在提供一种无监督的建模方法,允许更灵活地表示文本嵌入。它将单词和段落共同编码为使用单位Frobenius Norm的任意列维度的单个矩阵。通过引入一种新颖的相似性度量,该表示也是在语言上的动机。提出的建模和新型相似性度量利用了嵌入的基质结构。然后,我们继续证明可以将相同的矩阵重塑为单位标准的向量,并将我们的问题转变为面积歧管上的优化问题。我们利用歧管优化来有效地训练矩阵嵌入。我们还通过证明它们在文档分类,文档群集和语义文本相似性基准测试中证明了改进的结果,从而定量验证文本嵌入的质量。

This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a novel similarity metric. The proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem over the spherical manifold. We exploit manifold optimization to efficiently train the matrix embeddings. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering, and semantic textual similarity benchmark tests.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源