论文标题

Odontoai:一个人体标记的数据集和一个在线平台,可提高牙科全景X光片的研究

OdontoAI: A human-in-the-loop labeled data set and an online platform to boost research on dental panoramic radiographs

论文作者

Silva, Bernardo, Pinheiro, Laís, Sobrinho, Brenda, Lima, Fernanda, Sobrinho, Bruna, Abdalla, Kalyf, Pithon, Matheus, Cury, Patrícia, Oliveira, Luciano

论文摘要

在过去的几年中,深度学习取得了极大的进步,并得到了大型标记的数据集的支持。由于耗时的标签程序,这些数据集却是宝贵而稀缺的,因此阻止研究人员生产它们。在牙科中,这种稀缺性尤其如此,在牙科中,深度学习应用仍处于胚胎阶段。在这项研究中,我们在这项研究中解决了牙科全景X光片的公共数据集的构建。我们感兴趣的对象是牙齿,它们被分割和编号,因为它们是筛选全景X光片时牙医的主要目标。我们从人类的概念(HITL)概念中受益,以加快标签程序的速度,使用深层神经网络作为临时标签的预测,后来被人类注释者验证。彻底分析了此新型数据集的所有收集和标记程序。结果是一致的,并且表现如预期:在每次命中率时,模型预测得到了改善。我们的结果表明,使用HITL缩短了51%的标签时间,为我们节省了390多个连续的工作时间。在一个名为Odontoai的新颖的在线平台中,创建为这一新型数据集的任务中心,我们发布了4,000张图像,其中有2,000张具有公开标签可用于模型拟合。其他2,000张图像的标签是私人的,用于模型评估,考虑实例和语义分割和编号。据我们所知,这是用于全景X光片的最大规模的公开数据集,Odontoai是同类牙科的第一个平台。

Deep learning has remarkably advanced in the last few years, supported by large labeled data sets. These data sets are precious yet scarce because of the time-consuming labeling procedures, discouraging researchers from producing them. This scarcity is especially true in dentistry, where deep learning applications are still in an embryonic stage. Motivated by this background, we address in this study the construction of a public data set of dental panoramic radiographs. Our objects of interest are the teeth, which are segmented and numbered, as they are the primary targets for dentists when screening a panoramic radiograph. We benefited from the human-in-the-loop (HITL) concept to expedite the labeling procedure, using predictions from deep neural networks as provisional labels, later verified by human annotators. All the gathering and labeling procedures of this novel data set is thoroughly analyzed. The results were consistent and behaved as expected: At each HITL iteration, the model predictions improved. Our results demonstrated a 51% labeling time reduction using HITL, saving us more than 390 continuous working hours. In a novel online platform, called OdontoAI, created to work as task central for this novel data set, we released 4,000 images, from which 2,000 have their labels publicly available for model fitting. The labels of the other 2,000 images are private and used for model evaluation considering instance and semantic segmentation and numbering. To the best of our knowledge, this is the largest-scale publicly available data set for panoramic radiographs, and the OdontoAI is the first platform of its kind in dentistry.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源