论文标题
OCR图在文档中检测的特征
OCR Graph Features for Manipulation Detection in Documents
论文作者
论文摘要
在数字文档中检测操作对于信息验证目的而变得越来越重要。由于图像编辑软件的扩散,更改文档中的关键信息已广泛访问。该领域中几乎所有方法都依赖于程序方法,使用精心生成的功能和手工调整的评分系统,而不是数据驱动且可推广的方法。我们使用字符边界框将该问题作为图形比较问题,并提出了一个使用OCR(光学字符识别)来利用图形特征的模型。我们的模型依赖于数据驱动的方法来通过训练基于图的OCR特征的随机森林分类器来检测变化。我们评估了算法的伪造检测性能,这些数据集是由真实的业务文件构建的,并略有伪造的缺陷。我们提出的模型极大地胜过此任务上最密切相关的文档操纵检测模型。
Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task.