通过深层暹罗神经网络改善文本分类的文档嵌入

论文标题

通过深层暹罗神经网络改善文本分类的文档嵌入

Improve Document Embedding for Text Categorization Through Deep Siamese Neural Network

论文作者

Gharavi, Erfaneh, Veisi, Hadi

论文摘要

由于Internet上的数据量增加，因此为文本找到高度信息，低维表示是有效的自然语言处理任务（包括文本分类）的主要挑战之一。该表示形式应捕获文本的语义信息，同时保留其相关性进行文档分类。此方法将具有相似主题的文档映射到矢量空间表示中的相似空间。为了获得大文本的表示形式，我们提出了深层暹罗神经网络的利用。为了嵌入分布式表示中的主题中的文档相关性，我们使用暹罗神经网络共同学习文档表示。我们的暹罗网络由多层感知器的两个子网络组成。我们检查了BBC新闻数据集上文本分类任务的表示形式。结果表明，所提出的表示形式优于此数据集的文本分类任务中的常规和最新表示。

Due to the increasing amount of data on the internet, finding a highly-informative, low-dimensional representation for text is one of the main challenges for efficient natural language processing tasks including text classification. This representation should capture the semantic information of the text while retaining their relevance level for document classification. This approach maps the documents with similar topics to a similar space in vector space representation. To obtain representation for large text, we propose the utilization of deep Siamese neural networks. To embed document relevance in topics in the distributed representation, we use a Siamese neural network to jointly learn document representations. Our Siamese network consists of two sub-network of multi-layer perceptron. We examine our representation for the text categorization task on BBC news dataset. The results show that the proposed representations outperform the conventional and state-of-the-art representations in the text classification task on this dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题