使用电子健康记录的深度学习管道用于患者诊断预测

论文标题

使用电子健康记录的深度学习管道用于患者诊断预测

A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records

论文作者

Franz, Leopold, Shrestha, Yash Raj, Paudel, Bibek

论文摘要

近年来，使用机器学习算法的疾病诊断和医疗保健决策的增强正在增加。特别是，在当前由19009年大流行引起的流行病学状况中，用机器学习算法对疾病诊断的迅速预测可以促进识别和照顾弱势群体的人口群集，例如患有多种多样的人群。为了建立有用的疾病诊断预测系统，必须进行数据表示和机器学习架构的发展的进步。首先，在数据收集和表示方面，由于多种格式和电子健康记录（EHRS）中普遍存在的连贯性，我们面临严重的问题。这会导致提取EHR中包含的有价值信息的障碍。目前，尚未建立通用的全球数据标准。作为一个有用的解决方案，我们开发并发布了一个Python软件包，以将公共卫生数据集转换为易于访问的通用格式。这种数据转换为国际健康数据格式，有助于研究人员轻松将EHR数据集与各种格式的临床数据集相结合。其次，同时预测多种疾病诊断类别的机器学习算法仍然不发达。在这方面，我们提出了两个新颖的模型架构。首先，使用结构化数值数据来预测诊断类别的DeepObserver，其次是Clinicalbert_multi，该数据通过自然语言处理方法包含了丰富的临床注释中可用的信息，并且还为医生提供了可解释的可视化。我们表明，这两种模型都可以同时预测高精度的多个诊断。

Augmentation of disease diagnosis and decision-making in healthcare with machine learning algorithms is gaining much impetus in recent years. In particular, in the current epidemiological situation caused by COVID-19 pandemic, swift and accurate prediction of disease diagnosis with machine learning algorithms could facilitate identification and care of vulnerable clusters of population, such as those having multi-morbidity conditions. In order to build a useful disease diagnosis prediction system, advancement in both data representation and development of machine learning architectures are imperative. First, with respect to data collection and representation, we face severe problems due to multitude of formats and lack of coherency prevalent in Electronic Health Records (EHRs). This causes hindrance in extraction of valuable information contained in EHRs. Currently, no universal global data standard has been established. As a useful solution, we develop and publish a Python package to transform public health dataset into an easy to access universal format. This data transformation to an international health data format facilitates researchers to easily combine EHR datasets with clinical datasets of diverse formats. Second, machine learning algorithms that predict multiple disease diagnosis categories simultaneously remain underdeveloped. We propose two novel model architectures in this regard. First, DeepObserver, which uses structured numerical data to predict the diagnosis categories and second, ClinicalBERT_Multi, that incorporates rich information available in clinical notes via natural language processing methods and also provides interpretable visualizations to medical practitioners. We show that both models can predict multiple diagnoses simultaneously with high accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题