使用bert嵌入来模拟聋人和难以听见用户的对话成绩单中的单词重要性

论文标题

使用bert嵌入来模拟聋人和难以听见用户的对话成绩单中的单词重要性

Using BERT Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users

论文作者

Amin, Akhter Al, Hassan, Saad, Alm, Cecilia O., Huenerfauth, Matt

论文摘要

聋哑人在看直播电视时经常依靠字幕来聋。通过各种标题评估指标，通过监管机构评估实时电视字幕。但是，字幕评估指标通常不会由DHH用户的偏好或字幕有多有意义。有必要构建字幕评估指标，以考虑成绩单中单词的相对重要性。我们在两种类型的单词嵌入和现有语料库中标记的单词形象分数之间进行了相关分析。我们发现，使用BERT生成的归一化上下文化的单词嵌入与基于Word2VEC的单词嵌入更好的与手动注释的重要性分数更好。我们提供了单词嵌入的配对及其人类宣布的重要性分数。我们还通过训练单词重要性模型来提供概念验证实用程序，在6级单词重要性分类任务中达到0.57的F1得分。

Deaf and hard of hearing individuals regularly rely on captioning while watching live TV. Live TV captioning is evaluated by regulatory agencies using various caption evaluation metrics. However, caption evaluation metrics are often not informed by preferences of DHH users or how meaningful the captions are. There is a need to construct caption evaluation metrics that take the relative importance of words in a transcript into account. We conducted correlation analysis between two types of word embeddings and human-annotated labeled word-importance scores in existing corpus. We found that normalized contextualized word embeddings generated using BERT correlated better with manually annotated importance scores than word2vec-based word embeddings. We make available a pairing of word embeddings and their human-annotated importance scores. We also provide proof-of-concept utility by training word importance models, achieving an F1-score of 0.57 in the 6-class word importance classification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题