论文标题

学会像医生一样问

Learning to Ask Like a Physician

论文作者

Lehman, Eric, Lialin, Vladislav, Legaspi, Katelyn Y., Sy, Anne Janelle R., Pile, Patricia Therese S., Alberto, Nicole Rose I., Ragasa, Richard Raymund R., Puyat, Corinna Victoria M., Alberto, Isabelle Rose I., Alfonso, Pia Gabrielle I., Taliño, Marianne, Moukheiber, Dana, Wallace, Byron C., Rumshisky, Anna, Liang, Jenifer J., Raghavan, Preethi, Celi, Leo Anthony, Szolovits, Peter

论文摘要

从电子健康记录(EHR)得出的现有问题回答(QA)数据集是人为生成的,因此无法捕获现实的医师信息需求。我们提出了出院摘要临床问题(DISCQ),这是一个新策划的问题数据集,该数据集由2,000多个问题与引起每个问题的文本(触发器)配对。这些问题是由100多种模拟物III出院摘要的医学专家产生的。我们分析此数据集以表征医学专家寻求的信息类型。我们还训练用于触发检测和问题产生(QG)的基线模型,并与EHRS上的无监督答案检索配对。当被人类选定的触发器提示时,我们的基线模型能够在超过62%的案例中产生高质量的问题。我们发布此数据集(以及所有代码以重现基线模型结果),以促进对现实的临床质量检查和QG的进一步研究:https://github.com/elehman16/discq。

Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and consequently fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are generated by medical experts from 100+ MIMIC-III discharge summaries. We analyze this dataset to characterize the types of information sought by medical experts. We also train baseline models for trigger detection and question generation (QG), paired with unsupervised answer retrieval over EHRs. Our baseline model is able to generate high quality questions in over 62% of cases when prompted with human selected triggers. We release this dataset (and all code to reproduce baseline model results) to facilitate further research into realistic clinical QA and QG: https://github.com/elehman16/discq.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源