论文标题
半监督学习:无标记数据同样有用的情况
Semi-Supervised Learning: the Case When Unlabeled Data is Equally Useful
论文作者
论文摘要
半监督的学习算法试图利用相对便宜的未标记数据来提高学习绩效。在这项工作中,我们考虑可以通过连续参数来表征数据分布的统计模型。我们表明,在分布的某些条件下,未标记的数据同样有用,因为在学习率方面标记了日期。具体来说,让$ n,m $为标记和未标记数据的数量。结果表明,如果$ m \ sim n $,半监督学习的学习率将尺度缩放为$(1/n)$,则将其缩放为$ O(1/n^{1+γ})$,如果$ M \ sim n^{1+γ} $对于某些$γ> 0 $,则缩放为$ o(1+γ} $,而$ o的学习率为$ o($ o OO)$ o(1/n)。
Semi-supervised learning algorithms attempt to take advantage of relatively inexpensive unlabeled data to improve learning performance. In this work, we consider statistical models where the data distributions can be characterized by continuous parameters. We show that under certain conditions on the distribution, unlabeled data is equally useful as labeled date in terms of learning rate. Specifically, let $n, m$ be the number of labeled and unlabeled data, respectively. It is shown that the learning rate of semi-supervised learning scales as $O(1/n)$ if $m\sim n$, and scales as $O(1/n^{1+γ})$ if $m\sim n^{1+γ}$ for some $γ>0$, whereas the learning rate of supervised learning scales as $O(1/n)$.