自然语言的形式规格

论文标题

自然语言的形式规格

Formal Specifications from Natural Language

论文作者

Hahn, Christopher, Schmitt, Frederik, Tillman, Julia J., Metzger, Niklas, Siber, Julian, Finkbeiner, Bernd

论文摘要

我们研究语言模型的概括能力时，将自然语言转化为具有复杂语义的形式规范。特别是，我们在由英语句子组成的三个数据集上微调语言模型及其相应的正式表示：1）正则表达式（正则表达式），经常用于编程和搜索； 2）一阶逻辑（FOL），通常用于软件验证和定理证明； 3）线性时间逻辑（LTL），构成工业硬件规范语言的基础。我们的实验表明，在这些不同的领域中，语言模型保持其概括能力从预先训练的自然语言知识到概括，例如新的变量名称或操作员描述。此外，它们可以实现竞争性能，甚至超越了转化为正则表达式的最先进，具有易于访问，有效进行微调的好处，并且没有特别需要特定于领域的推理。

We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题