论文标题
与结构化高斯过程的部分序列标记
Partial sequence labeling with structured Gaussian Processes
论文作者
论文摘要
现有的部分序列标记模型主要集中在最大利润框架上,该框架未能对预测提供不确定性估计。此外,这些模型采用的独特地面真理剥夺策略可能包括用于参数学习的错误标签信息。在本文中,我们提出了部分序列标记(SGPPSL)的结构化高斯过程,该过程编码了预测中的不确定性,并且不需要额外的努力来选择模型选择和超参数学习。该模型采用因子式近似,将线性链结构划分为一组零件,该图形保留了基本的马尔可夫随机场结构,并有效地避免处理由部分注释的数据生成的大量候选输出序列。然后在模型中引入了置信度度量,以解决候选标签的不同贡献,这使地面真相标签信息可以用于参数学习中。基于所提出模型的变化下限的衍生下限,在交替优化的框架中估计了变分参数和置信度度量。此外,提出了加权Viterbi算法将置信度度量纳入序列预测,该预测考虑了标签歧义歧义来自训练数据中的多个注释,从而有助于提高性能。 SGPPSL在几个序列标记任务上进行了评估,实验结果显示了所提出的模型的有效性。
Existing partial sequence labeling models mainly focus on max-margin framework which fails to provide an uncertainty estimation of the prediction. Further, the unique ground truth disambiguation strategy employed by these models may include wrong label information for parameter learning. In this paper, we propose structured Gaussian Processes for partial sequence labeling (SGPPSL), which encodes uncertainty in the prediction and does not need extra effort for model selection and hyperparameter learning. The model employs factor-as-piece approximation that divides the linear-chain graph structure into the set of pieces, which preserves the basic Markov Random Field structure and effectively avoids handling large number of candidate output sequences generated by partially annotated data. Then confidence measure is introduced in the model to address different contributions of candidate labels, which enables the ground-truth label information to be utilized in parameter learning. Based on the derived lower bound of the variational lower bound of the proposed model, variational parameters and confidence measures are estimated in the framework of alternating optimization. Moreover, weighted Viterbi algorithm is proposed to incorporate confidence measure to sequence prediction, which considers label ambiguity arose from multiple annotations in the training data and thus helps improve the performance. SGPPSL is evaluated on several sequence labeling tasks and the experimental results show the effectiveness of the proposed model.