论文标题
输入:最大化最佳运输的信息
InfoOT: Information Maximizing Optimal Transport
论文作者
论文摘要
最佳运输通过最小化它们之间的运输成本(例如几何距离)来使样品跨分布对齐。但是,它忽略了诸如群集之类的数据中的连贯结构,不能很好地处理异常值,并且无法集成新的数据点。为了解决这些缺点,我们提出了Infoot,这是最佳传输的信息理论扩展,可最大程度地提高域之间的相互信息,同时最大程度地减少几何距离。最终的目标仍然可以作为(广义)最佳运输问题进行配合,并且可以通过投影梯度下降有效地解决。该公式产生了一种新的投影方法,该方法对异常值具有鲁棒性并概括为看不见的样本。从经验上讲,输入提高了域适应性,跨域检索和单细胞对齐的基准跨基准的比对质量。
Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.