论文标题
伯努利混合模型,以理解和预测儿童纵向喘息模式
A Bernoulli Mixture Model to Understand and Predict Children Longitudinal Wheezing Patterns
论文作者
论文摘要
在这项研究中,我们估计,在英国1岁之前,大约27.99美元(\ pm2.15)\%$ $ $。此外,发现Bernoulli混合模型分类与$ k = 4 $簇最有效,以便基于$ n = 1184 $的队列,以更好地平衡簇的可分离性与其解释性。 The probability of the group of parents in the $j$th cluster to say that their children have wheezed during the $i$th age is assumed $P_{ij} \sim \text{Beta}(1/2, 1/2)$, the probabilities of assignment to each cluster is $R \sim \text{Dirichlet}_K(α)$, the assignment of the $n$th patient to each cluster is $Z_n\ |\ R \sim \text{Categorical}(R)$, and the $n$th patient wheezed during the $i$th age is $X_{in}\ |\ P_{ij}, Z_n \sim \text{Bernoulli}(P_{i,Z_n})$;其中$ i \ in \ {1,\ dots,6 \} $,$ j \ in \ {1,\ dots,k \} $和$ n \ in \ {1,\ dots,n \} $。然后通过E-M优化算法进行分类。我们发现,这种聚类方法有效地有效地喘不过气,持续的喘息,早熟喘息,没有或零星的喘息。此外,我们发现此方法不依赖于数据集,并且可以包括缺少条目的数据集。
In this research, we estimate that around $27.99(\pm2.15)\%$ of the population has experienced wheezing before turning 1 in the United Kingdom. Furthermore, the Bernoulli Mixture Model classification is found to work best with $K=4$ clusters in order to better balance the separability of the clusters with their explanatory nature, based on a cohort of $N=1184$. The probability of the group of parents in the $j$th cluster to say that their children have wheezed during the $i$th age is assumed $P_{ij} \sim \text{Beta}(1/2, 1/2)$, the probabilities of assignment to each cluster is $R \sim \text{Dirichlet}_K(α)$, the assignment of the $n$th patient to each cluster is $Z_n\ |\ R \sim \text{Categorical}(R)$, and the $n$th patient wheezed during the $i$th age is $X_{in}\ |\ P_{ij}, Z_n \sim \text{Bernoulli}(P_{i,Z_n})$; where $i\in\{1,\dots,6\}$, $j\in\{1,\dots,K\}$, and $n\in\{1,\dots, N\}$. The classification is then performed through the E-M optimization algorithm. We found that this clustering method groups efficiently the patients with late-childhood wheezing, persistent wheezing, early-childhood wheezing, and none or sporadic wheezing. Furthermore, we found that this method is not dependent on the data-set, and can include data-sets with missing entries.