可变性事项：评估鲁棒细胞检测组织病理学间评估者间的可变性

论文标题

可变性事项：评估鲁棒细胞检测组织病理学间评估者间的可变性

Variability Matters : Evaluating inter-rater variability in histopathology for robust cell detection

论文作者

Kang, Cholmin, Lee, Chunggi, Song, Heon, Ma, Minuk, Pereira, S ergio

论文摘要

大型注释数据集已成为深度学习成功的关键组成部分。但是，注释医学图像需要具有挑战性，因为它需要专业知识和大量预算。特别是，由于任务的歧义，对组织病理学中不同类型的细胞的注释患有高评估者和评估者的变异性。在这种情况下，注释者的变异性和模型性能之间的关系几乎没有得到关注。我们提出了一项关于120名董事会认证的病理学家之间细胞注释变异性及其如何影响深度学习模型的性能的大规模研究。我们提出了一种测量这种变异性的方法，并通过排除那些具有较低变异性的注释者，我们验证了数据量及其质量之间的权衡。我们发现，以评估者间变异性为代价，天真地增加数据大小并不一定会导致细胞检测中表现更好的模型。取而代之的是，随着数据集尺寸的减少费用降低评估者间的变异性会增加模型性能。此外，从较低标记的变异性训练的模型优于较高标记间变异性的模型。这些发现表明，对注释者的评估可能有助于解决组织病理学领域的基本预算问题

Large annotated datasets have been a key component in the success of deep learning. However, annotating medical images is challenging as it requires expertise and a large budget. In particular, annotating different types of cells in histopathology suffer from high inter- and intra-rater variability due to the ambiguity of the task. Under this setting, the relation between annotators' variability and model performance has received little attention. We present a large-scale study on the variability of cell annotations among 120 board-certified pathologists and how it affects the performance of a deep learning model. We propose a method to measure such variability, and by excluding those annotators with low variability, we verify the trade-off between the amount of data and its quality. We found that naively increasing the data size at the expense of inter-rater variability does not necessarily lead to better-performing models in cell detection. Instead, decreasing the inter-rater variability with the expense of decreasing dataset size increased the model performance. Furthermore, models trained from data annotated with lower inter-labeler variability outperform those from higher inter-labeler variability. These findings suggest that the evaluation of the annotators may help tackle the fundamental budget issues in the histopathology domain

下载PDF全文

下载文献需遵守相关版权规定

论文标题