论文标题
通过弱监督来了解口语理解的资源较低
Low Resource Pipeline for Spoken Language Understanding via Weak Supervision
论文作者
论文摘要
在弱监督学习(WSL)中,对从语义规则和特定于任务的预训练模型获得的嘈杂标签进行了训练。规则对任务的概括有限,需要大量的手动工作,而预训练的模型仅适用于有限任务。在这项工作中,我们建议利用基于及时的方法作为弱来源,以获取未注释的数据上的嘈杂标签。我们表明,任务不合时宜的提示是可以推广的,可用于获取用于不同口语理解(SLU)任务的嘈杂标签,例如情感分类,不足的检测和情感分类。可以更新这些提示以添加特定于任务的上下文,从而为设计特定于任务的提示提供灵活性。我们证明,基于及时的方法为上述SLU任务生成可靠的标签,因此可以用作通用弱源在没有标记数据的情况下训练弱监督的模型(WSM)。我们提出的WSL管道对基于迅速的弱源进行了训练,在所有三个基准SLU数据集上,对零f1的零型学习和少数射击学习的其他竞争性低资源基准都超过4%。所提出的方法还优于传统的基于规则的WSL管道在宏F1上的表现超过5%。
In Weak Supervised Learning (WSL), a model is trained over noisy labels obtained from semantic rules and task-specific pre-trained models. Rules offer limited generalization over tasks and require significant manual efforts while pre-trained models are available only for limited tasks. In this work, we propose to utilize prompt-based methods as weak sources to obtain the noisy labels on unannotated data. We show that task-agnostic prompts are generalizable and can be used to obtain noisy labels for different Spoken Language Understanding (SLU) tasks such as sentiment classification, disfluency detection and emotion classification. These prompts could additionally be updated to add task-specific contexts, thus providing flexibility to design task-specific prompts. We demonstrate that prompt-based methods generate reliable labels for the above SLU tasks and thus can be used as a universal weak source to train a weak-supervised model (WSM) in absence of labeled data. Our proposed WSL pipeline trained over prompt-based weak source outperforms other competitive low-resource benchmarks on zero and few-shot learning by more than 4% on Macro-F1 on all of the three benchmark SLU datasets. The proposed method also outperforms a conventional rule based WSL pipeline by more than 5% on Macro-F1.