论文标题
IITK在FINSIM任务中:通过无上下文和上下文化的单词嵌入在金融领域中的超核检测
IITK at the FinSim Task: Hypernym Detection in Financial Domain via Context-Free and Contextualized Word Embeddings
论文作者
论文摘要
在本文中,我们介绍了关于Finsim 2020的方法,共享“金融领域的学习语义表示”。这项任务的目的是将财务术语分类为外部本体中最相关的高表(或顶级)概念。我们在分析中利用了与上下文相关和与上下文无关的单词嵌入。我们的系统部署Word2VEC嵌入了经过从头开始训练的语料库(英语财务招股说明书)以及预先训练的BERT嵌入。我们根据域规则将测试数据集分为两个子集。对于一个子集,我们使用无监督的距离度量来对术语进行分类。对于第二个子集,我们使用简单的监督分类器,例如Naive Bayes,在嵌入式顶部,得出最终预测。最后,我们结合了两个结果。我们的系统基于指标,即平均等级和准确性排名第一。
In this paper, we present our approaches for the FinSim 2020 shared task on "Learning Semantic Representations for the Financial Domain". The goal of this task is to classify financial terms into the most relevant hypernym (or top-level) concept in an external ontology. We leverage both context-dependent and context-independent word embeddings in our analysis. Our systems deploy Word2vec embeddings trained from scratch on the corpus (Financial Prospectus in English) along with pre-trained BERT embeddings. We divide the test dataset into two subsets based on a domain rule. For one subset, we use unsupervised distance measures to classify the term. For the second subset, we use simple supervised classifiers like Naive Bayes, on top of the embeddings, to arrive at a final prediction. Finally, we combine both the results. Our system ranks 1st based on both the metrics, i.e., mean rank and accuracy.