论文标题

Meddialog:两个大规模的医学对话数据集

MedDialog: Two Large-scale Medical Dialogue Datasets

论文作者

He, Xuehai, Chen, Shu, Ju, Zeqian, Dong, Xiangyu, Fang, Hongchao, Wang, Sicheng, Yang, Yue, Zeng, Jiaqi, Zhang, Ruisi, Zhang, Ruoyu, Zhou, Meng, Zhu, Penghui, Xie, Pengtao

论文摘要

医疗对话系统有望协助远程医疗增加获得医疗服务,提高患者护理质量并降低医疗费用。为了促进医学对话系统的研究和开发,我们构建了两个大规模的医学对话数据集:Meddialog-en和Meddialog-CN。 Meddialog-en是一个英语数据集,其中包含患者和医生之间的30万对话和50万个话语。 Meddialog-CN是一个中国数据集,其中包含110万对话和400万个话语。据我们所知,Meddialog-(EN,CN)是迄今为止最大的医学对话数据集。该数据集可从https://github.com/ucsd-ai4h/medical-dialogue-system获得

Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build two large-scale medical dialogue datasets: MedDialog-EN and MedDialog-CN. MedDialog-EN is an English dataset containing 0.3 million conversations between patients and doctors and 0.5 million utterances. MedDialog-CN is an Chinese dataset containing 1.1 million conversations and 4 million utterances. To our best knowledge, MedDialog-(EN,CN) are the largest medical dialogue datasets to date. The dataset is available at https://github.com/UCSD-AI4H/Medical-Dialogue-System

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源