论文标题
对医疗表格数据的分类和风险预测的不确定性估计
Uncertainty estimation for classification and risk prediction on medical tabular data
论文作者
论文摘要
在医疗保健等数据筛分领域,模型通常会对患有罕见状况的患者进行预测,衡量模型预测不确定性的能力可能会导致决策支持工具的有效性提高,并提高用户信任。这项工作以两倍的方式了解了对医疗表格数据的分类和风险预测的不确定性估计的理解。首先,我们扩展并完善了一组启发式方法,以选择一种不确定性估计技术,从而引入了针对临床上相关的情况的测试,例如对不常见的病理,临床方案的变化和损坏数据的模拟。我们此外,根据临床用例,将这些启发式方法区分开。其次,我们观察到,在检测到室外示例时,合奏和相关技术的性能很差,这是由自动编码器更成功地执行的关键任务。这些言论通过考虑不确定性估计与类别失衡,建模后校准和其他建模程序的相互作用的考虑来丰富。我们的发现得到了关于玩具和现实世界数据的一系列实验的支持。
In a data-scarce field such as healthcare, where models often deliver predictions on patients with rare conditions, the ability to measure the uncertainty of a model's prediction could potentially lead to improved effectiveness of decision support tools and increased user trust. This work advances the understanding of uncertainty estimation for classification and risk prediction on medical tabular data, in a two-fold way. First, we expand and refine the set of heuristics to select an uncertainty estimation technique, introducing tests for clinically-relevant scenarios such as generalization to uncommon pathologies, changes in clinical protocol and simulations of corrupted data. We furthermore differentiate these heuristics depending on the clinical use-case. Second, we observe that ensembles and related techniques perform poorly when it comes to detecting out-of-domain examples, a critical task which is carried out more successfully by auto-encoders. These remarks are enriched by considerations of the interplay of uncertainty estimation with class imbalance, post-modeling calibration and other modeling procedures. Our findings are supported by an array of experiments on toy and real-world data.