论文标题
推断有限人口的轮廓
Extrapolating the profile of a finite population
论文作者
论文摘要
我们研究经验贝叶斯的典型问题。也就是说,考虑一个由$ k $的个人组成的人群,每个人物属于$ k $类型之一(某些类型可以是空的)。没有任何结构性限制,就不可能了解只有一个小尺寸$ m = o(k)$的小(随机)子样本的整体组成。然而,我们表明,在$ m =ω(k/\ log k)$的均方根状态下,可以始终如一地估算总体的\ emph {profile}的人口\ emph {profile},这定义为每种类型的大小的经验分布,这决定了人群的许多对称特性。我们还证明,在任何常数$ c $的线性状态下,最佳利率为$θ(1/\ log k)$。我们的估计器基于Wolfowitz的最小距离方法,该方法需要解决$ k $的线性程序(LP)。我们表明,有一个无限尺寸的LP,其价值同时表征了最小距离估计器的风险,并证明其最小值最佳性。通过使用复杂的分析技术评估该LP来获得尖锐的收敛速率。
We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size $m = o(k)$. Nevertheless, we show that in the sublinear regime of $m =ω(k/\log k)$, it is possible to consistently estimate in total variation the \emph{profile} of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $Θ(1/\log k)$. Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size $k$. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.