论文标题

多变量广义线性混合模型用于不足的计数数据

Multivariate generalized linear mixed models for underdispersed count data

论文作者

da Silva, Guilherme Parreira, Laureano, Henrique Aparecido, Petterle, Ricardo Rasmussen, Júnior, Paulo Justiniano Ribeiro, Bonat, Wagner Hugo

论文摘要

研究人员通常有兴趣了解一组协变量与一组响应变量之间的关系。为了实现这一目标,在很大程度上应用了回归分析(线性或广义线性模型)的使用。但是,这样的模型仅允许用户一次对一个响应变量进行建模。此外,不可能从回归模型直接计算响应变量之间的相关度量。在本文中,我们采用了多元广义线性混合模型框架,该框架允许对一组响应变量进行规范,并通过遵循多元正态分布的随机效应结构来计算它们之间的相关性。我们使用最大似然估计框架来使用拉普拉斯近似来估算所有模型参数,以集成随机效应。衍生物是通过自动分化提供的。使用通用算法(例如\ texttt {port}和\ texttt {bfgs})进行外部最大化。我们通过仅研究使用以下分布的计数响应变量来界定这个问题:泊松,负二项式(NB)和com-poisson。这些模型是在软件\ texttt {r}上实现的,包含\ texttt {tmb}。除了完整的规范外,还考虑了协方差矩阵中具有更简单结构的模型(固定和共同方差,固定分散,$ρ$设置为0)。这些模型被应用于国家健康和营养检查调查的数据集,其中在1281名受试者处测量了三个分散的响应变量。 com-poisson模型完全指定地克服了其他两个竞争者,考虑了三个拟合索引。因此,提出的模型可以处理多变量计数响应,并考虑到协变量的效果来衡量它们之间的相关性。

Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not possible to directly calculate from the regression model a correlation measure between the response variables. In this article, we employed the Multivariate Generalized Linear Mixed Models framework, which allows the specification of a set of response variables and calculates the correlation between them through a random effect structure that follows a multivariate normal distribution. We used the maximum likelihood estimation framework to estimate all model parameters using Laplace approximation to integrate out the random effects. The derivatives are provided by automatic differentiation. The outer maximization was made using a general-purpose algorithm such as \texttt{PORT} and \texttt{BFGS}. We delimited this problem by studying only count response variables with the following distributions: Poisson, negative binomial (NB) and COM-Poisson. The models were implemented on software \texttt{R} with package \texttt{TMB}. Besides the full specification, models with simpler structures in the covariance matrix were considered (fixed and common variance, fixed dispersion, $ρ$ set to 0). These models were applied to a dataset from the National Health and Nutrition Examination Survey, where three underdispersed response variables were measured at 1281 subjects. The COM-Poisson model full specified overcome the other two competitors considering three goodness-of-fit indexes. Therefore, the proposed model can deal with multivariate count responses and measures the correlation between them taking into account the effects of the covariates.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源