重新评估无监督语法诱导中多模式信号的需求

论文标题

重新评估无监督语法诱导中多模式信号的需求

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

论文作者

Li, Boyi, Corona, Rodolfo, Mangalam, Karttikeya, Chen, Catherine, Flaherty, Daniel, Belongie, Serge, Weinberger, Kilian Q., Malik, Jitendra, Darrell, Trevor, Klein, Dan

论文摘要

语法诱导是否需要多模式输入？最近的工作表明，多模式训练输入可以改善语法诱导。但是，这些改进是基于与较少文本数据训练的纯文本基线的比较。为了确定在具有大量文本培训数据的制度中是否需要多模式输入，我们设计了更强的仅文本基线，我们称为LC-PCFG。 LC-PCFG是一种C-PFCG，它结合了来自文本大型语言模型（LLMS）的EM折叠。我们使用固定的语法家族直接将LC-PCFG与各种多模式语法诱导方法进行比较。我们比较了四个基准数据集上的性能。与最先进的多模式语法诱导方法相比，LC-PCFG可提供高达17％的Copus-F1相对改善。 LC-PCFG在计算上也更有效，与多模式方法相比，参数计数的降低高达85％，训练时间减少了8.8倍。这些结果表明，对于语法诱导，多模式输入可能不是必需的，并且强调了强大无远见的基准在评估多模式方法的好处的重要性。

Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates em-beddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multi-modal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8x reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题