论文标题
MCC-F1曲线:二进制分类的性能评估技术
The MCC-F1 curve: a performance evaluation technique for binary classification
论文作者
论文摘要
许多字段使用ROC曲线和PR曲线作为二进制分类方法的标准评估。然而,对ROC和PR的分析通常会产生误导性和膨胀的性能评估,尤其是在地面不平衡的情况下。在这里,我们通过模拟证明了ROC和PR分析的问题,并提出了MCC-F1曲线以解决这些缺点。 MCC-F1曲线结合了两个信息丰富的单阈值指标,MCC和F1得分。 MCC-F1曲线更清楚地区分了好和坏分类器,即使地面真相不平衡。我们还介绍了MCC-F1度量,该指标提供了一个单个值,该值整合了整个分类阈值范围内的分类器性能的许多方面。最后,我们提供了一个绘制MCC-F1曲线并计算相关指标的R软件包。
Many fields use the ROC curve and the PR curve as standard evaluations of binary classification methods. Analysis of ROC and PR, however, often gives misleading and inflated performance evaluations, especially with an imbalanced ground truth. Here, we demonstrate the problems with ROC and PR analysis through simulations, and propose the MCC-F1 curve to address these drawbacks. The MCC-F1 curve combines two informative single-threshold metrics, MCC and the F1 score. The MCC-F1 curve more clearly differentiates good and bad classifiers, even with imbalanced ground truths. We also introduce the MCC-F1 metric, which provides a single value that integrates many aspects of classifier performance across the whole range of classification thresholds. Finally, we provide an R package that plots MCC-F1 curves and calculates related metrics.