兰伯特：图案了解（语言）用于信息提取的建模

论文标题

兰伯特：图案了解（语言）用于信息提取的建模

LAMBERT: Layout-Aware (Language) Modeling for information extraction

论文作者

Garncarek, Łukasz, Powalski, Rafał, Stanisławek, Tomasz, Topolski, Bartosz, Halama, Piotr, Turski, Michał, Graliński, Filip

论文摘要

我们介绍了一种简单的新方法，以理解非平凡布局影响本地语义的文档的问题。为此，我们以一种允许其使用从OCR系统获得的布局功能的方式修改变压器编码器体系结构，而无需从头开始重新学习语言语言。我们仅使用令牌边界框的坐标来增强模型的输入，从而避免使用原始图像。这导致了一个布局意识的语言模型，然后可以在下游任务上进行微调。使用四个公开可用数据集对端到端信息提取任务进行评估：Kleister NDA，Kleister Charity，Sroie和Cord。我们表明，我们的模型在由视觉上丰富的文档组成的数据集上实现了卓越的性能，同时在平面布局（nda \（f_ {1} \）上，从78.50到80.42的文档上都优于基线Roberta。我们的解决方案在公共排行榜上排名第一，从SROIE数据集中提取了关键信息，从而改善了SOTA \（f_ {1} \） - 从97.81到98.17分。

We introduce a simple new approach to the problem of understanding documents where non-trivial layout influences the local semantics. To this end, we modify the Transformer encoder architecture in a way that allows it to use layout features obtained from an OCR system, without the need to re-learn language semantics from scratch. We only augment the input of the model with the coordinates of token bounding boxes, avoiding, in this way, the use of raw images. This leads to a layout-aware language model which can then be fine-tuned on downstream tasks. The model is evaluated on an end-to-end information extraction task using four publicly available datasets: Kleister NDA, Kleister Charity, SROIE and CORD. We show that our model achieves superior performance on datasets consisting of visually rich documents, while also outperforming the baseline RoBERTa on documents with flat layout (NDA \(F_{1}\) increase from 78.50 to 80.42). Our solution ranked first on the public leaderboard for the Key Information Extraction from the SROIE dataset, improving the SOTA \(F_{1}\)-score from 97.81 to 98.17.

下载PDF全文

下载文献需遵守相关版权规定

论文标题