通过第二分遗忘来表征数据点

论文标题

通过第二分遗忘来表征数据点

Characterizing Datapoints via Second-Split Forgetting

论文作者

Maini, Pratyush, Garg, Saurabh, Lipton, Zachary C., Kolter, J. Zico

论文摘要

研究榜样硬度的研究人员越来越集中于神经网络在整个培训中学习和忘记例子的动态。从这些动力学中得出的流行指标包括（i）示例首先正确分类的时期；（ii）他们的预测在训练过程中翻转的次数；（iii）他们的预测是否会被淘汰。但是，这些指标并没有区分出于不同原因而难以区分的示例，例如在罕见的亚群中的成员身份，被贴错标签或属于复杂的亚群。在本文中，我们建议$ second $ - $ split $ $忘记$ $ time $（ssft），这是一个互补的度量标准，跟踪时期（如果有的话），此后忘记了原始培训示例，因为该网络在随机保留的数据分区中进行了微调。在多个基准的数据集和模式中，我们证明了$标签的$示例很快被遗忘了，看似$稀有的$示例被相对缓慢地遗忘了。相比之下，指标仅考虑第一个分裂学习动力来努力区分两者。在很大的学习速度下，SSFT在架构，优化者和随机种子之间往往稳健。从实际的角度来看，SSFT可以（i）有助于识别标签错误的样本，从而改善了概括；（ii）提供有关故障模式的见解。通过解决过度参数线性模型的理论分析，我们提供了有关观察到的现象如何出现的见解。可以在此处找到用于复制我们实验的代码：https：//github.com/pratyushmaini/ssft

Researchers investigating example hardness have increasingly focused on the dynamics by which neural networks learn and forget examples throughout training. Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out. However, these metrics do not distinguish among examples that are hard for distinct reasons, such as membership in a rare subpopulation, being mislabeled, or belonging to a complex subpopulation. In this paper, we propose $second$-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten as the network is fine-tuned on a randomly held out partition of the data. Across multiple benchmark datasets and modalities, we demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. By contrast, metrics only considering the first split learning dynamics struggle to differentiate the two. At large learning rates, SSFT tends to be robust across architectures, optimizers, and random seeds. From a practical standpoint, the SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes. Through theoretical analysis addressing overparameterized linear models, we provide insights into how the observed phenomena may arise. Code for reproducing our experiments can be found here: https://github.com/pratyushmaini/ssft

下载PDF全文

下载文献需遵守相关版权规定

论文标题