论文标题
Logic2Text:高保真自然语言从逻辑形式产生
Logic2Text: High-Fidelity Natural Language Generation from Logical Forms
论文作者
论文摘要
从结构化数据中的自然语言产生(NLG)的先前工作主要集中在记录序列的表面级描述上。但是,对于复杂的结构化数据,例如多排表,NLG系统通常希望从跨记录中描述有趣的事实。如果仅提供表格,那么现有模型很难产生可控且高保真的逻辑世代。在这项工作中,我们从逻辑形式中制定逻辑级别的NLG作为生成,以获得可控,高保真和忠实的一代。我们提供了一个新的大规模数据集,\ textsc {logic2text},其中10,753个描述涉及常见的逻辑类型,并配对了基础逻辑形式。逻辑形式显示了自由模式的多样化图形结构,这对模型理解语义的能力构成了巨大的挑战。我们对(1)完整数据集进行了完全监督的培训,以及(2)提供数百个配对示例的几个设置;我们比较了几种大众一代模型并分析其表演。我们希望我们的数据集能够鼓励研究建立能够自然,忠实和类似人类的高级NLG系统。该数据集和代码可在https://github.com/czyssrs/logic2text上获得。
Previous works on Natural Language Generation (NLG) from structured data have primarily focused on surface-level descriptions of record sequences. However, for complex structured data, e.g., multi-row tables, it is often desirable for an NLG system to describe interesting facts from logical inferences across records. If only provided with the table, it is hard for existing models to produce controllable and high-fidelity logical generations. In this work, we formulate logical level NLG as generation from logical forms in order to obtain controllable, high-fidelity, and faithful generations. We present a new large-scale dataset, \textsc{Logic2Text}, with 10,753 descriptions involving common logic types paired with the underlying logical forms. The logical forms show diversified graph structure of free schema, which poses great challenges on the model's ability to understand the semantics. We experiment on (1) Fully-supervised training with the full datasets, and (2) Few-shot setting, provided with hundreds of paired examples; We compare several popular generation models and analyze their performances. We hope our dataset can encourage research towards building an advanced NLG system capable of natural, faithful, and human-like generation. The dataset and code are available at https://github.com/czyssrs/Logic2Text.