递归树语法自动编码器

论文标题

递归树语法自动编码器

Recursive Tree Grammar Autoencoders

论文作者

Paassen, Benjamin, Koprinska, Irena, Yacef, Kalina

论文摘要

树上的机器学习主要集中在树上，作为算法的输入。研究的研究少得多，将树木作为输出，它具有许多应用，例如用于药物发现的分子优化或用于智能辅导系统的提示。在这项工作中，我们提出了一种新型的自动编码器方法，称为递归树语法自动编码器（RTG-AE），该方法通过自下而上的解析器编码树木，并通过树语法解码树，均通过递归神经网络学习，以最大程度地减少各种自身量工损失。然后可以在后续任务中使用所得的编码器和解码器，例如优化和时间序列预测。 RTG-AES是第一个结合变异自动编码器，语法知识和递归处理的模型。我们的关键信息是，这三个元素的独特组合优于结合三个中的任何两个模型。特别是，我们进行了一项消融研究，以表明我们提出的方法可改善合成和实际数据集的自动编码误差，训练时间和优化评分与四个基线相比。我们进一步证明RTG-AES在线性时间内解析并生成树木，并且表现得足以处理所有常规的树语法。

Machine learning on trees has been mostly focused on trees as input to algorithms. Much less research has investigated trees as output, which has many applications, such as molecule optimization for drug discovery, or hint generation for intelligent tutoring systems. In this work, we propose a novel autoencoder approach, called recursive tree grammar autoencoder (RTG-AE), which encodes trees via a bottom-up parser and decodes trees via a tree grammar, both learned via recursive neural networks that minimize the variational autoencoder loss. The resulting encoder and decoder can then be utilized in subsequent tasks, such as optimization and time series prediction. RTG-AEs are the first model to combine variational autoencoders, grammatical knowledge, and recursive processing. Our key message is that this unique combination of all three elements outperforms models which combine any two of the three. In particular, we perform an ablation study to show that our proposed method improves the autoencoding error, training time, and optimization score on synthetic as well as real datasets compared to four baselines. We further prove that RTG-AEs parse and generate trees in linear time and are expressive enough to handle all regular tree grammars.

下载PDF全文

下载文献需遵守相关版权规定

论文标题