论文标题
半监督三分的稳健的电感传递学习
Semi-supervised Triply Robust Inductive Transfer Learning
论文作者
论文摘要
在这项工作中,我们提出了一种半监督的三位稳健感应转移学习(TRELIFE)方法,该方法整合了来自富含标签的源人群和标签量表目标人群的异质数据,并同时利用大量未标记的数据以提高目标人群的学习准确性。具体而言,我们考虑一个高维协变量的偏移设置,并采用了两个滋扰模型,即密度比模型和归纳模型,以有效地结合转移学习和替代辅助的半监督学习策略并实现三重鲁棒性。鉴于替代特征S和预测因素X既允许Y | X的真实基础模型,但由于潜在的s和x的潜在协调转移而在两个种群之间差异,虽然trife方法假设目标和源人群可以共享结果的相同条件分布。当转移的源人群和目标人群具有足够的相似性时,估算器仍可以部分使用源总数。此外,可以保证不会比仅靠目标替代的半监督估计量差的差,并从可传递性检测中具有额外的误差项。理论上建立了我们估计量的这些理想特性,并通过广泛的模拟研究在有限样品中进行了验证。我们利用图估计量来训练非裔美国人目标人群的II型糖尿病多基因风险预测模型,通过从电子健康记录中转移知识,与较大的欧洲源人群中观察到的基因组数据联系在一起。
In this work, we propose a Semi-supervised Triply Robust Inductive transFer LEarning (STRIFLE) approach, which integrates heterogeneous data from a label-rich source population and a label-scarce target population and utilizes a large amount of unlabeled data simultaneously to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an imputation model, to combine transfer learning and surrogate-assisted semi-supervised learning strategies effectively and achieve triple robustness. While the STRIFLE approach assumes the target and source populations to share the same conditional distribution of outcome Y given both the surrogate features S and predictors X, it allows the true underlying model of Y|X to differ between the two populations due to the potential covariate shift in S and X. Different from double robustness, even if both nuisance models are misspecified or the distribution of Y|(S, X) is not the same between the two populations, the triply robust STRIFLE estimator can still partially use the source population when the shifted source population and the target population share enough similarities. Moreover, it is guaranteed to be no worse than the target-only surrogate-assisted semi-supervised estimator with an additional error term from transferability detection. These desirable properties of our estimator are established theoretically and verified in finite samples via extensive simulation studies. We utilize the STRIFLE estimator to train a Type II diabetes polygenic risk prediction model for the African American target population by transferring knowledge from electronic health records linked genomic data observed in a larger European source population.