论文标题
表预训练:关于模型架构,预训练目标和下游任务的调查
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
论文作者
论文摘要
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc. To fully use the supervision signals in unlabeled tables, a已经设计和评估了各种培训预训练目标,例如降级单元格值,预测数值关系并隐式执行SQL。为了最好地利用(半)结构表的特征,已经探索了各种表格语言模型,尤其是具有专门设计的注意力机制。由于表通常会出现并与自由形式的文本相互作用,因此表格预培训通常采用表Text关节预训练的形式,从而吸引了来自多个领域的重要研究兴趣。这项调查旨在对不同的模型设计,培训预训练目标以及桌面预训练的下游任务进行全面审查,并进一步分享我们对现有挑战和未来机会的想法和愿景。
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc. To fully use the supervision signals in unlabeled tables, a variety of pre-training objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and implicitly executing SQLs. And to best leverage the characteristics of (semi-)structured tables, various tabular language models, particularly with specially-designed attention mechanisms, have been explored. Since tables usually appear and interact with free-form text, table pre-training usually takes the form of table-text joint pre-training, which attracts significant research interests from multiple domains. This survey aims to provide a comprehensive review of different model designs, pre-training objectives, and downstream tasks for table pre-training, and we further share our thoughts and vision on existing challenges and future opportunities.