进行微调：冻结预训练的Denoising自动编码器以改进概括

论文标题

进行微调：冻结预训练的Denoising自动编码器以改进概括

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

论文作者

Xie, Sang Michael, Ma, Tengyu, Liang, Percy

论文摘要

我们专注于具有符合输出有效性约束的结构化输出的预测问题，例如伪代码转换代码必须编译。虽然标记的输入输出对的获取昂贵，但“未标记”输出（即没有相应输入的输出）是免费的（例如，GitHub上的代码），并提供有关输出有效性的信息。我们可以通过预先培训DENOISER来捕获输出结构，从而损坏了未标记的输出版本。我们首先表明，预训练后的标准微调会破坏其中的一些结构。然后，我们提出了构成的微调，该调整通过预先训练的DeNoiser组成的预测器进行微调，该预测器被冷冻以保留输出结构。对于两层relu网络，我们证明组成微调可以显着降低预测变量的复杂性，从而改善概括。从经验上讲，我们表明，在两个伪代码转换数据集（相对3％和6％）上，组成的微调比标准微调改善了。通过分布（OOD）示例（相对4％和25％），对组成的微调的改进被放大。

We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the output structure by pre-training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser, which is frozen to preserve output structure. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). The improvement from composed fine-tuning is magnified on out-of-distribution (OOD) examples (4% and 25% relative).

下载PDF全文

下载文献需遵守相关版权规定

论文标题