论文标题
作为互动效果的常规化液掉落
Dropout as a Regularizer of Interaction Effects
论文作者
论文摘要
我们通过相互作用的视角检查辍学。此视图提供了解释辍学的对称性:给定$ n $变量,有$ {n \ select k} $可能的$ k $变量集以形成交互(即$ \ nathcal {o}(n^k)$);相反,$ k $变量的交互概率以$ p $ $ p $为$(1-p)^k $(用$ k $衰减)的掉落。这些速率有效取消,因此与高阶相互作用进行了正规化。我们在分析和经验上证明了这一观点。作为对交互作用效应的正规化器的这种辍学的观点具有几种实际含义:(1)当我们需要更牢固的正规化对伪造的高阶相互作用时,应使用较高的辍学率,(2)在解释基于掉落的解释和不确定性措施时应谨慎行事,以及(3)接受输入辍学的网络时,应采用偏置的估计器。我们还将辍学者与其他正规化器进行比较,发现很难在高阶相互作用上获得相同的选择压力。
We examine Dropout through the perspective of interactions. This view provides a symmetry to explain Dropout: given $N$ variables, there are ${N \choose k}$ possible sets of $k$ variables to form an interaction (i.e. $\mathcal{O}(N^k)$); conversely, the probability an interaction of $k$ variables survives Dropout at rate $p$ is $(1-p)^k$ (decaying with $k$). These rates effectively cancel, and so Dropout regularizes against higher-order interactions. We prove this perspective analytically and empirically. This perspective of Dropout as a regularizer against interaction effects has several practical implications: (1) higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions, (2) caution should be exercised when interpreting Dropout-based explanations and uncertainty measures, and (3) networks trained with Input Dropout are biased estimators. We also compare Dropout to other regularizers and find that it is difficult to obtain the same selective pressure against high-order interactions.