交叉检查：快速，可重现和可解释的模型评估

论文标题

交叉检查：快速，可重现和可解释的模型评估

CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

论文作者

Arendt, Dustin, Huang, Zhuanyi, Shrestha, Prasha, Ayton, Ellyn, Glenski, Maria, Volkova, Svitlana

论文摘要

评估超出总体绩效指标，例如F1得分对于在机器学习模型中建立适当的信任以及确定未来的模型改进至关重要。在本文中，我们演示了Crosscheck，这是一种交互式可视化工具，用于快速交叉模型比较和可再现误差分析。我们描述工具并讨论设计和实现细节。然后，我们介绍三个用例（命名实体识别，阅读理解和点击检测），以显示使用该工具进行模型评估的好处。 CrossCheck允许数据科学家做出明智的决定，以在多个模型之间进行选择，确定何时正确的模型，并在哪些示例中调查模型是否犯了与人类相同的错误，评估模型的通用性和突出显示模型的局限性，优势和劣势。此外，Crosscheck被实施为Jupyter小部件，该小部件可以快速简便地集成到数据科学家的模型开发工作流程中。

Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We then present three use cases (named entity recognition, reading comprehension, and clickbait detection) that show the benefits of using the tool for model evaluation. CrossCheck allows data scientists to make informed decisions to choose between multiple models, identify when the models are correct and for which examples, investigate whether the models are making the same mistakes as humans, evaluate models' generalizability and highlight models' limitations, strengths and weaknesses. Furthermore, CrossCheck is implemented as a Jupyter widget, which allows rapid and convenient integration into data scientists' model development workflows.

下载PDF全文

下载文献需遵守相关版权规定

论文标题