论文标题
Finbert-MRC:使用机器阅读理解范式下BERT的财务名称实体识别
FinBERT-MRC: financial named entity recognition using BERT under the machine reading comprehension paradigm
论文作者
论文摘要
文献中的财务名称实体认可(Finner)在财务文本信息提取领域是一项艰巨的任务,该任务旨在从非结构化的文本中提取大量的财务知识。使用序列标记框架来实现罚款任务是广泛接受的。但是,这种序列标记模型无法完全利用文本中的语义信息。取而代之的是,我们将罚款任务作为机器阅读理解(MRC)问题,并提出了一种称为Finbert-MRC的新模型。该公式通过利用精心设计的查询引入了重要的先验信息,并提取了目标实体的索引和最终索引,而无需解码模块,例如条件随机字段(CRF)。我们对公开可用的中国金融数据集Chfinann和现实词的商业数据集管理员进行实验。 Finbert-MRC模型在两个数据集上分别达到92.78%和96.80%的平均F1评分,在某些序列标记模型(包括Bilstm-CRF,Bert-Tagger和Bert-Crf)上,平均F1增益 +3.94%和 +0.89%。源代码可在https://github.com/zyz0000/finbert-mrc上找到。
Financial named entity recognition (FinNER) from literature is a challenging task in the field of financial text information extraction, which aims to extract a large amount of financial knowledge from unstructured texts. It is widely accepted to use sequence tagging frameworks to implement FinNER tasks. However, such sequence tagging models cannot fully take advantage of the semantic information in the texts. Instead, we formulate the FinNER task as a machine reading comprehension (MRC) problem and propose a new model termed FinBERT-MRC. This formulation introduces significant prior information by utilizing well-designed queries, and extracts start index and end index of target entities without decoding modules such as conditional random fields (CRF). We conduct experiments on a publicly available Chinese financial dataset ChFinAnn and a real-word bussiness dataset AdminPunish. FinBERT-MRC model achieves average F1 scores of 92.78% and 96.80% on the two datasets, respectively, with average F1 gains +3.94% and +0.89% over some sequence tagging models including BiLSTM-CRF, BERT-Tagger, and BERT-CRF. The source code is available at https://github.com/zyz0000/FinBERT-MRC.