论文标题

试验2VEC:零射击临床试验文件的相似性搜索使用自我安排

Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision

论文作者

Wang, Zifeng, Sun, Jimeng

论文摘要

临床试验对于药物开发是必不可少的,但非常昂贵且耗时。在设计临床试验时,研究类似的历史试验是有益的。但是,冗长的试用文件和缺乏标记的数据使试验相似性搜索变得困难。我们提出了一种零拍的临床试验检索方法试验2VEC,该方法通过自学知识学习而无需注释类似的临床试验。具体而言,试验文档的元结构(例如标题,资格标准,目标疾病)以及临床知识(例如,UMLS知识基础https://www.nlm.nih.gov/research/umls/index/index.html)利用自动生成对比样品。此外,Trial2VEC编码考虑元结构的试验文件,从而产生紧凑的嵌入,从而从整个文档中汇总了多方面的信息。我们表明,我们的方法通过可视化产生了可解释的医学解释嵌入,并且在试验检索的精确/召回率上的最佳基线比最佳基线得到15%的改善,这是在我们标记的1600个试验对中评估的。此外,我们证明预先训练的嵌入在240K试验中受益于下游试验结果预测任务。软件IAS可在https://github.com/ryanwangzf/trial2vec上找到。

Clinical trials are essential for drug development but are extremely expensive and time-consuming to conduct. It is beneficial to study similar historical trials when designing a clinical trial. However, lengthy trial documents and lack of labeled data make trial similarity search difficult. We propose a zero-shot clinical trial retrieval method, Trial2Vec, which learns through self-supervision without annotating similar clinical trials. Specifically, the meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge (e.g., UMLS knowledge base https://www.nlm.nih.gov/research/umls/index.html) are leveraged to automatically generate contrastive samples. Besides, Trial2Vec encodes trial documents considering meta-structure thus producing compact embeddings aggregating multi-aspect information from the whole document. We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval, which is evaluated on our labeled 1600 trial pairs. In addition, we prove the pre-trained embeddings benefit the downstream trial outcome prediction task over 240k trials. Software ias available at https://github.com/RyanWangZf/Trial2Vec.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源