激励精度的二进制评分规则

论文标题

激励精度的二进制评分规则

Binary Scoring Rules that Incentivize Precision

论文作者

Neyman, Eric, Noarov, Georgy, Weinberg, S. Matthew

论文摘要

所有适当的评分规则都激励专家预测\ emph {准确}（报告他们的真实估计），但并非所有适当的评分规则都同样激励\ emph {precision}。我们没有将专家的信念视为外源性的，而是考虑一个模型，理性专家可以通过反复支付固定成本来内源性地完善自己的信念，并通过适当的评分规则激励这样做。具体而言，我们的专家旨在预测明天有偏见的硬币会降落的可能性，并且可以以每次翻转为$ c $的售价多次翻转硬币。我们的第一个主要结果定义了适当评分规则的\ emph {激励索引}，并证明该索引衡量了专家估算的预期错误（今天选择了今天的翻转数量，以最大程度地提高预测变量的预期收益）。我们的第二个主要结果找到了独特的评分规则，该规则优化了所有适当评分规则的激励指数。我们还考虑扩展以最大程度地减少$ \ ell^{th} $错误时刻，并再次提供激励索引和最佳的适当评分规则。在某些情况下，由此产生的评分规则是可区分的，但不是无限差异的。在这些情况下，我们进一步证明了最佳可以通过多项式评分规则统一地近似。最后，我们通过我们的措施比较了共同的评分规则，并包括模拟，即使在证明其适用的范围内，也确认了我们度量的相关性。

All proper scoring rules incentivize an expert to predict \emph{accurately} (report their true estimate), but not all proper scoring rules equally incentivize \emph{precision}. Rather than treating the expert's belief as exogenously given, we consider a model where a rational expert can endogenously refine their belief by repeatedly paying a fixed cost, and is incentivized to do so by a proper scoring rule. Specifically, our expert aims to predict the probability that a biased coin flipped tomorrow will land heads, and can flip the coin any number of times today at a cost of $c$ per flip. Our first main result defines an \emph{incentivization index} for proper scoring rules, and proves that this index measures the expected error of the expert's estimate (where the number of flips today is chosen adaptively to maximize the predictor's expected payoff). Our second main result finds the unique scoring rule which optimizes the incentivization index over all proper scoring rules. We also consider extensions to minimizing the $\ell^{th}$ moment of error, and again provide an incentivization index and optimal proper scoring rule. In some cases, the resulting scoring rule is differentiable, but not infinitely differentiable. In these cases, we further prove that the optimum can be uniformly approximated by polynomial scoring rules. Finally, we compare common scoring rules via our measure, and include simulations confirming the relevance of our measure even in domains outside where it provably applies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题