论文标题
Meddialog:两个大规模的医学对话数据集
MedDialog: Two Large-scale Medical Dialogue Datasets
论文作者
论文摘要
医疗对话系统有望协助远程医疗增加获得医疗服务,提高患者护理质量并降低医疗费用。为了促进医学对话系统的研究和开发,我们构建了两个大规模的医学对话数据集:Meddialog-en和Meddialog-CN。 Meddialog-en是一个英语数据集,其中包含患者和医生之间的30万对话和50万个话语。 Meddialog-CN是一个中国数据集,其中包含110万对话和400万个话语。据我们所知,Meddialog-(EN,CN)是迄今为止最大的医学对话数据集。该数据集可从https://github.com/ucsd-ai4h/medical-dialogue-system获得
Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build two large-scale medical dialogue datasets: MedDialog-EN and MedDialog-CN. MedDialog-EN is an English dataset containing 0.3 million conversations between patients and doctors and 0.5 million utterances. MedDialog-CN is an Chinese dataset containing 1.1 million conversations and 4 million utterances. To our best knowledge, MedDialog-(EN,CN) are the largest medical dialogue datasets to date. The dataset is available at https://github.com/UCSD-AI4H/Medical-Dialogue-System