论文标题

对不监督句子表示的对比度学习

Debiased Contrastive Learning of Unsupervised Sentence Representations

论文作者

Zhou, Kun, Zhang, Beichen, Zhao, Wayne Xin, Wen, Ji-Rong

论文摘要

最近,对比度学习已被证明可以有效改善预训练的语言模型(PLM)来得出高质量的句子表示。它旨在提取密切的积极例子,以增强对齐方式,同时将无关的负面因素推向整个代表空间的均匀性。但是,以前的作品主要采用批处理负面因素或从培训数据中随机采取样本。这种方式可能会导致抽样偏见,即使用不当的负面因素(例如错误的负面因素和各向异性表示)来学习句子表示,这会损害表示空间的统一性。为了解决它,我们提出了一个新的框架\ textbf {dclr}(\下划线{d} ebiased \ ebusewessline {c} onTrastive \ intrastive \ underline {l} nline {l} nove ventline {l} ncormentline {r} senline {r} empresentations {r} empresentations {r} empresentations {r} empresentations)减轻了这些即兴负责人的影响。在DCLR中,我们设计了一种实例加权方法来惩罚假否定性并产生基于噪声的负面因素,以确保表示空间的统一性。对七个语义文本相似性任务的实验表明,我们的方法比竞争基线更有效。我们的代码和数据在链接上公开可用:\ TextColor {blue} {\ url {https://github.com/rucaibox/dclr}}。

Recently, contrastive learning has been shown to be effective in improving pre-trained language models (PLM) to derive high-quality sentence representations. It aims to pull close positive examples to enhance the alignment while push apart irrelevant negatives for the uniformity of the whole representation space. However, previous works mostly adopt in-batch negatives or sample from training data at random. Such a way may cause the sampling bias that improper negatives (e.g. false negatives and anisotropy representations) are used to learn sentence representations, which will hurt the uniformity of the representation space. To address it, we present a new framework \textbf{DCLR} (\underline{D}ebiased \underline{C}ontrastive \underline{L}earning of unsupervised sentence \underline{R}epresentations) to alleviate the influence of these improper negatives. In DCLR, we design an instance weighting method to punish false negatives and generate noise-based negatives to guarantee the uniformity of the representation space. Experiments on seven semantic textual similarity tasks show that our approach is more effective than competitive baselines. Our code and data are publicly available at the link: \textcolor{blue}{\url{https://github.com/RUCAIBox/DCLR}}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源