试验2VEC：零射击临床试验文件的相似性搜索使用自我安排

论文标题

试验2VEC：零射击临床试验文件的相似性搜索使用自我安排

Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision

论文作者

Wang, Zifeng, Sun, Jimeng

论文摘要

临床试验对于药物开发是必不可少的，但非常昂贵且耗时。在设计临床试验时，研究类似的历史试验是有益的。但是，冗长的试用文件和缺乏标记的数据使试验相似性搜索变得困难。我们提出了一种零拍的临床试验检索方法试验2VEC，该方法通过自学知识学习而无需注释类似的临床试验。具体而言，试验文档的元结构（例如标题，资格标准，目标疾病）以及临床知识（例如，UMLS知识基础https://www.nlm.nih.gov/research/umls/index/index.html）利用自动生成对比样品。此外，Trial2VEC编码考虑元结构的试验文件，从而产生紧凑的嵌入，从而从整个文档中汇总了多方面的信息。我们表明，我们的方法通过可视化产生了可解释的医学解释嵌入，并且在试验检索的精确/召回率上的最佳基线比最佳基线得到15％的改善，这是在我们标记的1600个试验对中评估的。此外，我们证明预先训练的嵌入在240K试验中受益于下游试验结果预测任务。软件IAS可在https://github.com/ryanwangzf/trial2vec上找到。

Clinical trials are essential for drug development but are extremely expensive and time-consuming to conduct. It is beneficial to study similar historical trials when designing a clinical trial. However, lengthy trial documents and lack of labeled data make trial similarity search difficult. We propose a zero-shot clinical trial retrieval method, Trial2Vec, which learns through self-supervision without annotating similar clinical trials. Specifically, the meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge (e.g., UMLS knowledge base https://www.nlm.nih.gov/research/umls/index.html) are leveraged to automatically generate contrastive samples. Besides, Trial2Vec encodes trial documents considering meta-structure thus producing compact embeddings aggregating multi-aspect information from the whole document. We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval, which is evaluated on our labeled 1600 trial pairs. In addition, we prove the pre-trained embeddings benefit the downstream trial outcome prediction task over 240k trials. Software ias available at https://github.com/RyanWangZf/Trial2Vec.

下载PDF全文

下载文献需遵守相关版权规定

论文标题