人类或机器：自动化人类的可能性评估NLG文本

论文标题

人类或机器：自动化人类的可能性评估NLG文本

Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

论文作者

Çano, Erion, Bojar, Ondřej

论文摘要

通过数据驱动的智能方法对各种文本质量标准进行自动评估非常常见和有用，因为它便宜，快速且通常会产生可重复的结果。在本文中，我们提出了一种尝试自动化人类对来自自然语言生成方法的输出文本样本的可能性评估，该方法用于解决多个任务。我们建议使用人类的可能性得分，该分数显示了一种看起来好像是由人撰写的方法的输出样本的百分比。我们没有让人类参与者标记或对这些样本进行评分，而是通过使用基于大型语言模型及其概率分布的歧视程序来完全自动化该过程。随后，我们计划对人写入和机器生成的文本进行经验分析，以找到这种评估方法的最佳设置。涉及人类参与者的验证程序还将检查自动评估与人类判断的相关性。

Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate the human likeliness evaluation of the output text samples coming from natural language generation methods used to solve several tasks. We propose to use a human likeliness score that shows the percentage of the output samples from a method that look as if they were written by a human. Instead of having human participants label or rate those samples, we completely automate the process by using a discrimination procedure based on large pretrained language models and their probability distributions. As follow up, we plan to perform an empirical analysis of human-written and machine-generated texts to find the optimal setup of this evaluation approach. A validation procedure involving human participants will also check how the automatic evaluation correlates with human judgments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题