论文标题

带有XOR查询的二元分类:基本限制和有效算法

Binary Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

论文作者

Kim, Daesung, Chung, Hye Won

论文摘要

我们考虑一个基于查询的数据采集问题,用于未知标签的二进制分类,该标签在通信,众包,推荐系统和主动学习中具有不同的应用。为了确保尽可能少数量的查询数量的未知标签的可靠恢复,我们考虑了一种有效的查询类型,该查询类型询问所选对象子集的“组属性”。特别是,我们考虑将$ m $二进制标签与XOR查询分类的问题,这些问题询问在所选尺寸$ d $的选定子集中具有给定属性的对象数量是否甚至是奇数。我们称查询学位的子集大小$ d $在查询上可能会有所不同。我们考虑了一个通用噪声模型,在该模型中,查询的答案的准确性取决于工人(数据提供商)和查询程度$ D $。对于此通用模型,我们表征了最佳查询数量的信息理论限制,以可靠地恢复$ m $标签,以$ d $ $ d $查询和噪声参数的给定组合。此外,我们提出了一种有效的推理算法,即使噪声参数未知,也可以达到此限制。

We consider a query-based data acquisition problem for binary classification of unknown labels, which has diverse applications in communications, crowdsourcing, recommender systems and active learning. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks "group attribute" of a chosen subset of objects. In particular, we consider the problem of classifying $m$ binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size $d$ is even or odd. The subset size $d$, which we call query degree, can be varying over queries. We consider a general noise model where the accuracy of answers on queries changes depending both on the worker (the data provider) and query degree $d$. For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover $m$ labels in terms of a given combination of degree-$d$ queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源