情境数量预测的实证研究

论文标题

情境数量预测的实证研究

An Empirical Investigation of Contextualized Number Prediction

论文作者

Spokoyny, Daniel, Berg-Kirkpatrick, Taylor

论文摘要

我们对运行文本中的情境数量预测进行了大规模的实证研究。具体而言，我们考虑两个任务：（1）掩盖数字预测句子中缺少数值的数值，以及（2）数值异常检测检测，检测句子中错误的数值值。我们在真实数字线上的上下文编码器和输出分布的新型组合实验。具体而言，我们引入了一套输出分布参数化的套件，这些参数包含了潜在变量，以增加表达性并更好地拟合运行文本中数字值的自然分布，并将它们与基于反复和变压器的编码器架构相结合。我们在财务和科学领域的两个数字数据集上评估了这些模型。我们的发现表明，包含离散潜在变量并允许多种模式的输出分布优于所有数据集上的简单基于简单的对应物，从而产生了更准确的数值预测和异常检测。我们还表明，我们的模型有效地利用了文本语素，并受益于通用无监督的预处理。

We conduct a large scale empirical investigation of contextualized number prediction in running text. Specifically, we consider two tasks: (1)masked number prediction-predicting a missing numerical value within a sentence, and (2)numerical anomaly detection-detecting an errorful numeric value within a sentence. We experiment with novel combinations of contextual encoders and output distributions over the real number line. Specifically, we introduce a suite of output distribution parameterizations that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text, and combine them with both recurrent and transformer-based encoder architectures. We evaluate these models on two numeric datasets in the financial and scientific domain. Our findings show that output distributions that incorporate discrete latent variables and allow for multiple modes outperform simple flow-based counterparts on all datasets, yielding more accurate numerical prediction and anomaly detection. We also show that our models effectively utilize textual con-text and benefit from general-purpose unsupervised pretraining.

下载PDF全文

下载文献需遵守相关版权规定

论文标题