Panther：途径增强非负张量分解用于高阶特征学习

论文标题

Panther：途径增强非负张量分解用于高阶特征学习

PANTHER: Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning

论文作者

Luo, Yuan, Mao, Chengsheng

论文摘要

遗传途径通常编码可以告知靶向干预措施的分子机制。对于现有的机器学习方法而言，这通常是具有挑战性的，可以共同建模遗传途径（高阶特征）和变体（原子特征），并向临床医生出现可解释的模型。为了为遗传医学构建更准确，更容易解释的机器学习模型，我们引入了途径增强非负张量分解以进行高阶特征学习（Panther）。 Panther选择了直接编码分子机制的信息遗传途径。我们以反映分子机制相互作用的方式将遗传动机的受限张因子分解应用于组途径。然后，我们使用已识别的途径组训练疾病类型的软疗法分类器。我们针对多个最新的约束张量/矩阵分解模型以及组引导和贝叶斯分层模型进行了评估。 Panther的表现优于所有最新比较模型（P <0.05）。我们对大规模下一代测序（NGS）和全基因组基因分型数据集进行的实验也证明了Panther的广泛适用性。我们在预测疾病类型的情况下进行了特征分析，该疾病类型提出了识别途径组的见解和益处。

Genetic pathways usually encode molecular mechanisms that can inform targeted interventions. It is often challenging for existing machine learning approaches to jointly model genetic pathways (higher-order features) and variants (atomic features), and present to clinicians interpretable models. In order to build more accurate and better interpretable machine learning models for genetic medicine, we introduce Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning (PANTHER). PANTHER selects informative genetic pathways that directly encode molecular mechanisms. We apply genetically motivated constrained tensor factorization to group pathways in a way that reflects molecular mechanism interactions. We then train a softmax classifier for disease types using the identified pathway groups. We evaluated PANTHER against multiple state-of-the-art constrained tensor/matrix factorization models, as well as group guided and Bayesian hierarchical models. PANTHER outperforms all state-of-the-art comparison models significantly (p<0.05). Our experiments on large scale Next Generation Sequencing (NGS) and whole-genome genotyping datasets also demonstrated wide applicability of PANTHER. We performed feature analysis in predicting disease types, which suggested insights and benefits of the identified pathway groups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题