论文标题
依赖的祝福:具有多个二进制变量的离散模型的可识别性和几何形状
Blessing of Dependence: Identifiability and Geometry of Discrete Models with Multiple Binary Latent Variables
论文作者
论文摘要
众所周知,具有潜在变量的离散统计模型的可识别性在研究中具有挑战性,但对于模型的可解释性和可靠性至关重要。这项工作提出了一种一般代数技术,可以研究具有潜在和图形组件的离散模型的可识别性。具体而言,由收集多元分类数据的诊断测试的动机,我们专注于具有多个二进制潜在变量的离散模型。我们考虑了潜在变量可以在彼此之间具有任意依赖性的祝福模型,而潜在的测量图则具有“恒星森林”的形状。我们建立了可识别性的必要和足够的图形标准,并揭示了依赖祝福的有趣的几何形状:在通用性可识别性的最小条件下,当潜在变量在统计上与统计上无关时,参数才能识别。借助这一理论,我们可以通过测试观察到的变量的边际独立性来对边界案例进行正式的假设检验。除了祝福模型外,我们还使用该技术显示出更灵活的模型的可识别性和依赖性几何形状,该模型具有超出开始森林的一般测量图。我们的结果使对具有潜在变量的图形模型的统计特性有了新的了解。它们还需要对设计诊断测试或测量二进制潜在特征的调查的有用含义。
Identifiability of discrete statistical models with latent variables is known to be challenging to study, yet crucial to a model's interpretability and reliability. This work presents a general algebraic technique to investigate identifiability of discrete models with latent and graphical components. Specifically, motivated by diagnostic tests collecting multivariate categorical data, we focus on discrete models with multiple binary latent variables. We consider the BLESS model in which the latent variables can have arbitrary dependencies among themselves while the latent-to-observed measurement graph takes a "star-forest" shape. We establish necessary and sufficient graphical criteria for identifiability, and reveal an interesting and perhaps surprising geometry of blessing-of-dependence: under the minimal conditions for generic identifiability, the parameters are identifiable if and only if the latent variables are not statistically independent. Thanks to this theory, we can perform formal hypothesis tests of identifiability in the boundary case by testing marginal independence of the observed variables. In addition to the BLESS model, we also use the technique to show identifiability and the blessing-of-dependence geometry for a more flexible model, which has a general measurement graph beyond a start forest. Our results give new understanding of statistical properties of graphical models with latent variables. They also entail useful implications for designing diagnostic tests or surveys that measure binary latent traits.