论文标题
关于多样性在重新采样中的重要性,对死亡风险模型中的不平衡数据和罕见事件
On the Importance of Diversity in Re-Sampling for Imbalanced Data and Rare Events in Mortality Risk Models
论文作者
论文摘要
当患者患有合并症时,手术风险大大增加。这导致创建了众多风险分层工具,目的是制定相关的手术风险,以帮助外科医生和患者进行决策。手术结果风险工具(排序)是为预测整个围手术期的死亡风险开发的工具之一,用于英国大型选修课。在这项研究中,我们通过解决数据集中的类不平衡来增强原始排序预测模型(英国排序)。我们提出的方法研究了基于多样性选择的应用在常见重新采样技术之上,以增强分类器在检测少数族裔(死亡率)事件方面的能力。培训数据集中的多样性是确保重新采样数据的重要因素,以保持对少数族裔/多数级别地区的准确描述,从而解决主流抽样方法的概括问题。我们将使用Solow-Polasky度量作为评估多样性的液位功能的使用,并添加贪婪的算法来识别和丢弃共享最相似性的子集。此外,通过经验实验,我们证明了经过基于多样性数据集的分类器的性能优于十个外部数据集的原始分类器。我们基于多样性的重新采样方法将英国排序算法的性能提高了1.4 $。
Surgical risk increases significantly when patients present with comorbid conditions. This has resulted in the creation of numerous risk stratification tools with the objective of formulating associated surgical risk to assist both surgeons and patients in decision-making. The Surgical Outcome Risk Tool (SORT) is one of the tools developed to predict mortality risk throughout the entire perioperative period for major elective in-patient surgeries in the UK. In this study, we enhance the original SORT prediction model (UK SORT) by addressing the class imbalance within the dataset. Our proposed method investigates the application of diversity-based selection on top of common re-sampling techniques to enhance the classifier's capability in detecting minority (mortality) events. Diversity amongst training datasets is an essential factor in ensuring re-sampled data keeps an accurate depiction of the minority/majority class region, thereby solving the generalization problem of mainstream sampling approaches. We incorporate the use of the Solow-Polasky measure as a drop-in functionality to evaluate diversity, with the addition of greedy algorithms to identify and discard subsets that share the most similarity. Additionally, through empirical experiments, we prove that the performance of the classifier trained over diversity-based dataset outperforms the original classifier over ten external datasets. Our diversity-based re-sampling method elevates the performance of the UK SORT algorithm by 1.4$.