论文标题

通过Frank-Wolfe算法对长文档的无监督提取性摘要的稀疏优化

Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm

论文作者

Tsai, Alicia Y., Ghaoui, Laurent El

论文摘要

我们解决了无监督的提取文档摘要的问题,尤其是对于长文件。我们将无监督的问题建模为稀疏自动回归的问题,并通过凸的,规范约束的问题近似产生的组合问题。我们使用专用的Frank-Wolfe算法来解决它。要生成带有$ k $句子的摘要,该算法只需要执行$ \ of of K $迭代,从而非常有效。我们说明如何避免明确计算完整梯度以及如何包括嵌入信息的句子。我们使用词汇(标准)胭脂分数以及语义(基于嵌入的)方法来评估其他两种无监督方法。我们的方法在两个数据集中取得了更好的结果,并且在与高度释义的摘要结合使用时,尤其有效。

We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with $k$ sentences, the algorithm only needs to execute $\approx k$ iterations, making it very efficient. We explain how to avoid explicit calculation of the full gradient and how to include sentence embedding information. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源