线性相关意义测试的预测数据校准

论文标题

线性相关意义测试的预测数据校准

Predictive Data Calibration for Linear Correlation Significance Testing

论文作者

Patil, Kaustubh R., Eickhoff, Simon B., Langner, Robert

论文摘要

推断线性关系是许多实证研究的核心。线性依赖性的度量应正确评估关系的强度，并符合对人群的有意义。 Pearson的相关系数（PCC）是双变量关系的\ textit {de-facto}量度，据已知两种问题都缺乏。估计的强度$ r $可能是由于样本量有限和数据不正常而可能错误的。在统计显着性测试的背景下，将$ p $值作为后验概率的错误解释导致I型错误 - 这是一个具有显着性测试的一般问题，扩展到PCC。同时检验多个假设时，此类错误会加剧。为了解决这些问题，我们提出了一种基于机器学习的预测数据校准方法，从本质上讲，该方法在预期的线性关系上进行了研究。使用校准数据计算PCC会产生校准的$ p $值，可以将其解释为后验概率以及校准的$ r $估计值，这是其他方法未提供的所需结果。此外，随之而来的对每个测试的独立解释可能会消除对多次测试校正的需求。我们提供了使用多个模拟和对现实世界数据的应用，有利于提出的方法的经验证据。

Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population. Pearson's correlation coefficient (PCC), the \textit{de-facto} measure for bivariate relationships, is known to lack in both regards. The estimated strength $r$ maybe wrong due to limited sample size, and nonnormality of data. In the context of statistical significance testing, erroneous interpretation of a $p$-value as posterior probability leads to Type I errors -- a general issue with significance testing that extends to PCC. Such errors are exacerbated when testing multiple hypotheses simultaneously. To tackle these issues, we propose a machine-learning-based predictive data calibration method which essentially conditions the data samples on the expected linear relationship. Calculating PCC using calibrated data yields a calibrated $p$-value that can be interpreted as posterior probability together with a calibrated $r$ estimate, a desired outcome not provided by other methods. Furthermore, the ensuing independent interpretation of each test might eliminate the need for multiple testing correction. We provide empirical evidence favouring the proposed method using several simulations and application to real-world data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题