通过KL回归识别多个环境中的不变因素

论文标题

通过KL回归识别多个环境中的不变因素

Identifying Invariant Factors Across Multiple Environments with KL Regression

论文作者

Gimenez, Jaime Roquero, Zou, James

论文摘要

从多个环境（例如不同的实验室，扰动等）收集了许多数据集，并且学习在环境之间不变的模型和关系通常是有利的。不变性可以提高对未知混杂因素的鲁棒性，并改善对新领域的概括。我们开发了一个新颖的框架 - KL回归 - 以可靠地估计充满挑战的多种环境环境中的回归系数，在该环境中，潜在混杂因素会影响每个环境的数据。 KL回归是基于一个新的目标，即同时最大程度地减少参数模型与来自每个环境的观察到的数据之间的KL差异。我们证明，KL回归在灵活的混杂设置下恢复了真正的不变因素。此外，它在计算上是有效的，因为我们为其全局最佳提供了一个分析解决方案。在系统的实验中，与常用方法相比，我们验证了KL回归的性能的提高。

Many datasets are collected from multiple environments (e.g. different labs, perturbations, etc.), and it is often advantageous to learn models and relations that are invariant across environments. Invariance can improve robustness to unknown confounders and improve generalization to new domains. We develop a novel framework -- KL regression -- to reliably estimate regression coefficients in a challenging multi-environment setting, where latent confounders affect the data from each environment. KL regression is based on a new objective of simultaneously minimizing the KL- divergence between a parametric model and the observed data from each environment. We prove that KL regression recovers the true invariant factors under a flexible confounding setup. Moreover, it is computationally efficient as we derive an analytic solution for its global optimum. In systematic experiments, we validate the improved performance of KL regression compared to commonly used approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题