使用垂直注意网络的端到端手写段落文本识别

论文标题

使用垂直注意网络的端到端手写段落文本识别

End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

论文作者

Coquenet, Denis, Chatelain, Clément, Paquet, Thierry

论文摘要

对于计算机视觉系统，无约束的手写文本识别仍然具有挑战性。传统上，文本识别是通过两种模型来实现的段落文本识别：第一个用于行分割的模型，第二个模型用于文本识别。我们使用混合注意力解决这项任务，提出了一个统一的端到端模型。该模型的设计旨在迭代按线处理段落图像。它可以分为三个模块。编码器从整个段落图像中生成特征地图。然后，注意模块反复生成垂直加权面膜，以使其专注于当前文本线路。这样，它执行一种隐式线段。对于每个文本行的特征，解码器模块识别相关的字符序列，从而识别整个段落。我们在三个流行的数据集上在段落级别上达到最新的字符错误率：Rimes的1.91％，IAM为4.45％，Read 2016的3.59％。我们的代码和训练有素的模型权重可以在https://github.com/factodeplearning/verticalnition/verticalatectorcon上找到。

Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题