分类器不确定性：证据，潜在影响和概率治疗

论文标题

分类器不确定性：证据，潜在影响和概率治疗

Classifier uncertainty: evidence, potential impact, and probabilistic treatment

论文作者

Tötsch, Niklas, Hoffmann, Daniel

论文摘要

分类器通常在相对较小的数据集上进行测试，这应该导致不确定的性能指标。然而，这些指标通常是在面值上的。我们提出了一种基于混淆矩阵的概率模型来量化分类性能指标不确定性的方法。我们从科学文献和分类竞赛中分类器的方法应用方法表明，不确定性可能是大型的，并且限制了绩效评估。实际上，一些已发表的分类器可能会产生误导。我们方法的应用很简单，只需要混淆矩阵。它是基础分类器的不可知论。我们的方法还可以用于估计达到性能度量的所需精度的样本量。

Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Nevertheless, these metrics are usually taken at face value. We present an approach to quantify the uncertainty of classification performance metrics, based on a probability model of the confusion matrix. Application of our approach to classifiers from the scientific literature and a classification competition shows that uncertainties can be surprisingly large and limit performance evaluation. In fact, some published classifiers are likely to be misleading. The application of our approach is simple and requires only the confusion matrix. It is agnostic of the underlying classifier. Our method can also be used for the estimation of sample sizes that achieve a desired precision of a performance metric.

下载PDF全文

下载文献需遵守相关版权规定

论文标题