纵向计算机断层扫描中肺癌诊断中的时距离视觉变压器

论文标题

纵向计算机断层扫描中肺癌诊断中的时距离视觉变压器

Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography

论文作者

Li, Thomas Z., Xu, Kaiwen, Gao, Riqiang, Tang, Yucheng, Lasko, Thomas A., Maldonado, Fabien, Sandler, Kim, Landman, Bennett A.

论文摘要

从单个放射图中学到的功能无法提供有关随着时间的流逝可能发生的病变以及多少变化的信息。从重复图像计算出的时间相关特征可以捕获这些变化，并通过其时间行为来识别恶性病变。但是，纵向医学成像提出了稀疏，不规则时间间隔的独特挑战。虽然自我注意事项已被证明是时间序列和自然图像的一种多功能，有效的学习机制，但尚未探索其在稀疏，不规则采样的空间特征之间解释时间距离的潜力。在这项工作中，我们通过使用（1）连续时间的矢量嵌入以及（2）时间强调自我注意力的权重来提出两种解释时间距离视觉变压器（VIT）。这两种算法是根据良性肺癌的良性与恶性肺癌区分评估合成肺结节和肺筛查计算机断层扫描研究（NLST）的。与标准VIT相比，评估合成结节的时间段VIT的实验表明，在对不规则采样的纵向图像进行分类方面有了基本的改善。在NLST筛选胸部CTS的交叉验证中，我们的方法（分别为0.785和0.786 AUC）明显胜过横截面方法（0.734 AUC），并匹配领先的纵向医学成像算法（0.779 AUC）对良性不利分类的区分性能。这项工作代表了第一个基于自我注意的框架，用于对纵向医学图像进行分类。我们的代码可在https://github.com/tom1193/time-distance-transformer上找到。

Features learned from single radiologic images are unable to provide information about whether and how much a lesion may be changing over time. Time-dependent features computed from repeated images can capture those changes and help identify malignant lesions by their temporal behavior. However, longitudinal medical imaging presents the unique challenge of sparse, irregular time intervals in data acquisition. While self-attention has been shown to be a versatile and efficient learning mechanism for time series and natural images, its potential for interpreting temporal distance between sparse, irregularly sampled spatial features has not been explored. In this work, we propose two interpretations of a time-distance vision transformer (ViT) by using (1) vector embeddings of continuous time and (2) a temporal emphasis model to scale self-attention weights. The two algorithms are evaluated based on benign versus malignant lung cancer discrimination of synthetic pulmonary nodules and lung screening computed tomography studies from the National Lung Screening Trial (NLST). Experiments evaluating the time-distance ViTs on synthetic nodules show a fundamental improvement in classifying irregularly sampled longitudinal images when compared to standard ViTs. In cross-validation on screening chest CTs from the NLST, our methods (0.785 and 0.786 AUC respectively) significantly outperform a cross-sectional approach (0.734 AUC) and match the discriminative performance of the leading longitudinal medical imaging algorithm (0.779 AUC) on benign versus malignant classification. This work represents the first self-attention-based framework for classifying longitudinal medical images. Our code is available at https://github.com/tom1193/time-distance-transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题