论文标题
人工智能应用于胸部X射线图像,以自动检测Covid-19。周到的评估方法
Artificial Intelligence applied to chest X-Ray images for the automatic detection of COVID-19. A thoughtful evaluation approach
论文作者
论文摘要
用于诊断CoVID-19的诊所中使用的当前标准方案包括分子或抗原测试,通常由普通的胸部X射线补充。联合分析旨在减少这些测试的大量虚假负面因素,但也提供有关疾病存在和严重性的补充证据。但是,该过程并非没有错误,并且由于其复杂性,胸部X射线的解释仅限于放射科医生。长期目标是为诊断提供新的证据,本文介绍了基于深神经网络的不同方法的评估。这些是使用胸部X射线图像开发自动COVID-19诊断工具的第一步,该图像还可以区分对照,肺炎或Covid-19组。本文描述了遵循的过程,该过程培训了一个卷积神经网络,该数据集的数据集超过79,500张X射线图像,其中包括8,500多个Covid-19示例,其中包括8,500个示例。为了评估和对开发的模型的比较,在三个预处理方案后进行了三个不同的实验。目的是评估数据如何影响结果并提高其解释性。同样,对可能损害系统及其对性能影响的不同变异性问题进行了批判性分析。借助使用的方法,获得了91.5%的分类精度,最糟糕但最可解释的实验的平均召回率为87.4%,这需要对肺部区域进行以前的自动分割。
Current standard protocols used in the clinic for diagnosing COVID-19 include molecular or antigen tests, generally complemented by a plain chest X-Ray. The combined analysis aims to reduce the significant number of false negatives of these tests, but also to provide complementary evidence about the presence and severity of the disease. However, the procedure is not free of errors, and the interpretation of the chest X-Ray is only restricted to radiologists due to its complexity. With the long term goal to provide new evidence for the diagnosis, this paper presents an evaluation of different methods based on a deep neural network. These are the first steps to develop an automatic COVID-19 diagnosis tool using chest X-Ray images, that would additionally differentiate between controls, pneumonia or COVID-19 groups. The paper describes the process followed to train a Convolutional Neural Network with a dataset of more than 79,500 X-Ray images compiled from different sources, including more than 8,500 COVID-19 examples. For the sake of evaluation and comparison of the models developed, three different experiments were carried out following three preprocessing schemes. The aim is to evaluate how preprocessing the data affects the results and improves its explainability. Likewise, a critical analysis is carried out about different variability issues that might compromise the system and the effects on the performance. With the employed methodology, a 91.5% classification accuracy is obtained, with a 87.4% average recall for the worst but most explainable experiment, which requires a previous automatic segmentation of the lungs region.