论文标题
检查机器翻译的大型预训练的语言模型:您对此不知道
Examining Large Pre-Trained Language Models for Machine Translation: What You Don't Know About It
论文作者
论文摘要
预训练的语言模型(PLM)通常会利用单语和多语言数据集的优势,该数据集可以在线免费获得,以在部署到特定任务中之前获取一般或混合域知识。超大型PLM(XLPLM)最近提议在较小尺寸的PLM(例如机器翻译(MT)任务)上声称最高性能。这些XLPLM包括Meta-AI的WMT21密度24宽-EN-X(2021)和NLLB(2022)。在这项工作中,我们检查了XLPLM在针对域特异性MT的微调中是否绝对优于较小的PLM。我们使用两个不同尺寸的不同域内数据:商业汽车内数据和WMT2022上Clinspen2022挑战的临床共享任务数据。我们选择受欢迎的玛丽安·赫尔辛基(Marian Helsinki)作为较小尺寸的PLM和两个从Meta-Ai的大型大型转换器作为XLPLM。 我们的实验研究表明,1)在较小尺寸的内域商业汽车数据上,XLPLM WMT21密集24宽24宽的EN-X确实显示出使用Sacrebleu和Hlepormits的评估得分要比较小尺寸的Marian更好,即使其得分率在调查后比Marian低于Marian; 2)在相对较大的准备良好的临床数据微调上,XLPLM NLLB倾向于使用Clinspen在两个子任务(临床术语和本体论概念)上失去比较小尺寸的玛丽安的优势,它提供了指标流星,彗星,彗星和Rouge-L,并且在所有官方Marian上对Marian在所有官方Marian上完全损失3)指标并不总是使用相同模型输出在相同任务上彼此彼此同意; 4)在所有提交中,临床 - 马里亚尔诊所在任务1(通过sacrebleu/bleu)和任务3(通过流星和胭脂)中排名第二。
Pre-trained language models (PLMs) often take advantage of the monolingual and multilingual dataset that is freely available online to acquire general or mixed domain knowledge before deployment into specific tasks. Extra-large PLMs (xLPLMs) are proposed very recently to claim supreme performances over smaller-sized PLMs such as in machine translation (MT) tasks. These xLPLMs include Meta-AI's wmt21-dense-24-wide-en-X (2021) and NLLB (2022). In this work, we examine if xLPLMs are absolutely superior to smaller-sized PLMs in fine-tuning toward domain-specific MTs. We use two different in-domain data of different sizes: commercial automotive in-house data and clinical shared task data from the ClinSpEn2022 challenge at WMT2022. We choose popular Marian Helsinki as smaller sized PLM and two massive-sized Mega-Transformers from Meta-AI as xLPLMs. Our experimental investigation shows that 1) on smaller-sized in-domain commercial automotive data, xLPLM wmt21-dense-24-wide-en-X indeed shows much better evaluation scores using SacreBLEU and hLEPOR metrics than smaller-sized Marian, even though its score increase rate is lower than Marian after fine-tuning; 2) on relatively larger-size well prepared clinical data fine-tuning, the xLPLM NLLB tends to lose its advantage over smaller-sized Marian on two sub-tasks (clinical terms and ontology concepts) using ClinSpEn offered metrics METEOR, COMET, and ROUGE-L, and totally lost to Marian on Task-1 (clinical cases) on all official metrics including SacreBLEU and BLEU; 3) metrics do not always agree with each other on the same tasks using the same model outputs; 4) clinic-Marian ranked No.2 on Task- 1 (via SACREBLEU/BLEU) and Task-3 (via METEOR and ROUGE) among all submissions.