Climedbert：一种用于气候和健康相关文本的预训练的语言模型

论文标题

Climedbert：一种用于气候和健康相关文本的预训练的语言模型

CliMedBERT: A Pre-trained Language Model for Climate and Health-related Text

论文作者

Fard, B. Jalalzadeh, Hasan, S. A., Bell, J. E.

论文摘要

气候变化正在以空前的命令和多种方式威胁人类健康。除非制定有效和基于证据的政策来最大程度地减少或消除这些威胁，否则这些威胁将有望增长。完成这样的任务需要最高的知识流从科学奔向政策。多学科，特定地点和广泛的已发表科学使跟踪在这一领域的新工作以及使传统知识合成方法效率低下的科学效率将科学融入政策方面变得具有挑战性。为此，我们考虑开发多个特定领域的语言模型（LMS），这些语言模型（LMS）与气候和健康相关的信息有所不同，这可以作为捕获可用知识的基本步骤，以启用不同的任务，例如检测与气候和健康相关的概念之间的相似之处，事实检验，事实检验，关系提取，与健康的证据，与健康相关，对政策生成的效果，以及对文本生成的影响，以及更多。据我们所知，这是提议为被考虑的域开发多个特定领域的语言模型的第一项工作。我们将使开发的模型，资源和代码库可用于研究人员。

Climate change is threatening human health in unprecedented orders and many ways. These threats are expected to grow unless effective and evidence-based policies are developed and acted upon to minimize or eliminate them. Attaining such a task requires the highest degree of the flow of knowledge from science into policy. The multidisciplinary, location-specific, and vastness of published science makes it challenging to keep track of novel work in this area, as well as making the traditional knowledge synthesis methods inefficient in infusing science into policy. To this end, we consider developing multiple domain-specific language models (LMs) with different variations from Climate- and Health-related information, which can serve as a foundational step toward capturing available knowledge to enable solving different tasks, such as detecting similarities between climate- and health-related concepts, fact-checking, relation extraction, evidence of health effects to policy text generation, and more. To our knowledge, this is the first work that proposes developing multiple domain-specific language models for the considered domains. We will make the developed models, resources, and codebase available for the researchers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题