基于Netvlad的作家身份证明和作家检索

论文标题

基于Netvlad的作家身份证明和作家检索

Writer Identification and Writer Retrieval Based on NetVLAD with Re-ranking

论文作者

Rasoulzadeh, Shervin, Babaali, Bagher

论文摘要

本文介绍了作者身份证和作者检索，这被认为是文档分析和识别领域中的一个具有挑战性的问题。在这项工作中，通过采用由Resnet-20作为特征提取器和集成的NetVlad层组成的统一神经网络结构提出了一条新的管道，该统一的神经网络架构受到本地聚集描述符（VLAD）的向量的启发。在定义了此体系结构后，三胞胎半障碍函数用于直接学习单个输入图像贴片的嵌入。随后，使用广义的最大流动技术用于每个手写图像的嵌入式描述符的聚合。此外，引入了一种新颖的重新排列策略，以基于$ k $ - 重点最近的邻居的识别和检索任务，这表明该管道可以从这一步中受益匪浅。实验评估已在三个公开可用的数据集上进行：ICDAR 2013，CVL和KHATT数据集。结果表明，尽管我们与Khatt上的最先进的表现相当，但我们的作者身份证和作者检索管道在ICDAR 2013和CVL数据集上以MAP的形式实现了出色的性能。

This paper addresses writer identification and writer retrieval which is considered as a challenging problem in the document analysis and recognition field. In this work, a novel pipeline is proposed for the problem at hand by employing a unified neural network architecture consisting of the ResNet-20 as a feature extractor and an integrated NetVLAD layer, inspired by the vector of locally aggregated descriptors (VLAD), in the head of the latter part. Having defined this architecture, the triplet semi-hard loss function is used to directly learn an embedding for individual input image patches. Subsequently, generalized max-pooling technique is employed for the aggregation of embedded descriptors of each handwritten image. Also, a novel re-ranking strategy is introduced for the task of identification and retrieval based on $k$-reciprocal nearest neighbors, and it is shown that the pipeline can benefit tremendously from this step. Experimental evaluation has been done on the three publicly available datasets: the ICDAR 2013, CVL, and KHATT datasets. Results indicate that while we perform comparably to the state-of-the-art on the KHATT, our writer identification and writer retrieval pipeline achieves superior performance on the ICDAR 2013 and CVL datasets in terms of mAP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题