论文标题
统计学习中的风险单调性
Risk-Monotonicity in Statistical Learning
论文作者
论文摘要
在机器学习的许多应用中,获取数据是一项艰巨的任务,只有一个人希望并期望人口风险随着数据点的增加而单调地降低(更好的性能)。事实证明,即使对于最大程度地降低经验风险的最标准算法,情况并非如此。在双重下降的描述下,训练中风险和不稳定性的非单调行为已经表现出来并出现在流行的深度学习范式中。这些问题突显了目前对学习算法和概括的理解。因此,追求这种关注并提供这种行为的特征至关重要。在本文中,我们为弱假设下的一般统计学习设置提供了第一个一致且风险单调的算法(高概率)算法,因此回答了Viering等人提出的一些问题。 2019年如何避免风险曲线的非单调行为。我们进一步表明,风险单调性不一定要以较差的多余风险率的价格出现。为了实现这一目标,我们得出了新的经验性伯恩斯坦样浓度不平等的独立利益不平等,这些不平等是某些非i.i.d。〜过程,例如martingale差异序列。
Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms that minimize the empirical risk. Non-monotonic behavior of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems highlight the current lack of understanding of learning algorithms and generalization. It is, therefore, crucial to pursue this concern and provide a characterization of such behavior. In this paper, we derive the first consistent and risk-monotonic (in high probability) algorithms for a general statistical learning setting under weak assumptions, consequently answering some questions posed by Viering et al. 2019 on how to avoid non-monotonic behavior of risk curves. We further show that risk monotonicity need not necessarily come at the price of worse excess risk rates. To achieve this, we derive new empirical Bernstein-like concentration inequalities of independent interest that hold for certain non-i.i.d.~processes such as Martingale Difference Sequences.