自然语言定义的学习解开表示

论文标题

自然语言定义的学习解开表示

Learning Disentangled Representations for Natural Language Definitions

论文作者

Carvalho, Danilo S., Mercatali, Giangiacomo, Zhang, Yingji, Freitas, Andre

论文摘要

解开神经模型的编码是改善自然语言处理中解释性，语义控制和下游任务性能的基本方面。当前，大多数分解方法是无监督的或依赖具有已知生成因素的合成数据集。我们认为，文本数据中的复发性句法和语义规律可用于为模型提供结构性偏见和生成因素。我们利用句子类型的代表性和语义密集类别中存在的语义结构，定义句子，用于训练差异自动编码器来学习分离的表示。我们的实验结果表明，所提出的模型在几种定性和定量基准的分解基准上优于无监督的基准，并且还提高了定义建模下游任务的结果。

Disentangling the encodings of neural models is a fundamental aspect for improving interpretability, semantic control and downstream task performance in Natural Language Processing. Currently, most disentanglement methods are unsupervised or rely on synthetic datasets with known generative factors. We argue that recurrent syntactic and semantic regularities in textual data can be used to provide the models with both structural biases and generative factors. We leverage the semantic structures present in a representative and semantically dense category of sentence types, definitional sentences, for training a Variational Autoencoder to learn disentangled representations. Our experimental results show that the proposed model outperforms unsupervised baselines on several qualitative and quantitative benchmarks for disentanglement, and it also improves the results in the downstream task of definition modeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题