基于多边形特征池化与融合的复杂文本检测

张相南; 高新波; 田春娜

doi:10.19665/j.issn1001-2400.20230801

您当前的位置：

首页 >

文章列表页 >

基于多边形特征池化与融合的复杂文本检测

计算机科学与技术 & 人工智能 | 更新时间：2024-07-23

- 基于多边形特征池化与融合的复杂文本检测
- Complex text region detection based on polygon feature pooling and the transformer
- 西安电子科技大学学报 2024年51卷第3期页码：113-123
- 作者机构：
  
  1. 西安电子科技大学电子工程学院,陕西西安 710071
  2. 重庆邮电大学计算机科学与技术学院图像认知重庆市重点实验室,重庆 400065
- 作者简介：
  
  [ "张相南(1991—),男,西安电子科技大学博士研究生,E-mail:[email protected]" ]
  [ "高新波(1972—),男,教授,E-mail:[email protected]" ]
  田春娜(1980—),女,教授,E-mail:[email protected]
- 基金信息：
  
  国家自然科学基金(62173265);国家自然科学基金(62036007)
- DOI：10.19665/j.issn1001-2400.20230801
  中图分类号： TP391
- 纸质出版日期：2024-06-20，
  
  网络出版日期：2023-08-22，
  
  收稿日期：2023-03-13，
- 稿件说明：
扫描看全文
张相南, 高新波, 田春娜. 基于多边形特征池化与融合的复杂文本检测[J]. 西安电子科技大学学报, 2024,51(3):113-123.

Xiangnan ZHANG, Xinbo GAO, Chunna TIAN. Complex text region detection based on polygon feature pooling and the transformer[J]. Journal of Xidian University, 2024,51(3):113-123.
张相南, 高新波, 田春娜. 基于多边形特征池化与融合的复杂文本检测[J]. 西安电子科技大学学报, 2024,51(3):113-123. DOI： 10.19665/j.issn1001-2400.20230801.

Xiangnan ZHANG, Xinbo GAO, Chunna TIAN. Complex text region detection based on polygon feature pooling and the transformer[J]. Journal of Xidian University, 2024,51(3):113-123. DOI： 10.19665/j.issn1001-2400.20230801.

摘要

文本检测在图像理解中发挥着重要的作用。基于深度学习的文本检测是当前的主流算法

包括单阶段方法和双阶段方法两类

而且后者的检测精度往往高于前者。双阶段的检测方法通常包含感兴趣区域特征池化操作

为进一步的检测和识别任务提供特定维度的局部区域特征。然而对于弯曲文本等复杂文本区域来说

现有的基于矩形感兴趣区域的池化方法不再适用

而基于点特征替代区域特征的方法又损失了空间信息。针对该问题

提出了一种基于多边形特征池化和Transformer的复杂文本区域检测方法。首先

将复杂文本区域检测中感兴趣区域进行多边形特征池化

将池化操作的区域形状从矩形拓展到多边形并且不需要借助其他形状进行拟合

即可将多边形区域对应的特征池化为固定维度的特征序列

避免了拟合过程中出现误差。进而

将池化后的特征视为具有空间关系的序列

然后利用Transformer融合视觉特征之间的上下文关系

降低训练难度

提升检测精确度。在包含弯曲文本等复杂文本情况的ICDAR2015、MLT、Total Text和CTW1500数据集上的测试实验结果表明

提出的双阶段检测算法能更好地提取感兴趣区域特征

并取得了比现有方法更好的检测结果。

Abstract

Text detection plays an important role in image understanding

and deep-learning-based algorithms are popular methods including single-stage and two-stage methods.Usually

two-stage based text detection methods have a higher accuracy than the single stage based methods.The two-stage text detection method usually contains the feature pooling operation in the region of interests(RoI)

which provides the local region features with fixed dimensions for further detection and recognition tasks.However

for complex text areas such as a curved text

the existing pooling methods based on the rectangular RoI are no longer applicable.Using point features instead of area features to solve the problem loses spatial information compared with area features.To address this issue

we propose a complex text region detection method based on polygon feature pooling and Transformer.First

we extend the feature pooling shape of RoI from the rectangle to the polygon

which does not need any shape fitting.and the features of polygon RoI with fixed dimensions are pooled

which avoids the error in the fitting process.Furthermore

the pooled polygon region features are regarded as context-sensitive sequences

which are input to the Transformer to fuse the context of the visual feature to reduce the training difficulties and improves the detection accuracy.Our experiments on the complex text region datasets

such as ICDAR2015

MLT

Total Text and CTW1500

show that the proposed two-stage detection algorithm can extract the features of RoI very well and achieves better detection results than the state-of-the-art methods.

关键词

文本检测双阶段方法多边形特征池化Transformer

Keywords

text region detectiontwo-stage methodspolygonfeature poolingTransformer

references

TSAI S S, CHEN H, CHEN D M, et al. Mobile Visual Search on Printed Documents Using Text and Low Bit-Rate Features[C]//Proceedings of the 18th IEEE International Conference on Image Processing(ICIP 2011).Piscataway:IEEE, 2011:2601-2604.

BARBER D B, REDDING J D, MCLAIN T W, et al. Vision-Based Target Geo-Location Using a Fixed-Wing Miniature Air Vehicle[J]. Journal of Intelligent & Robotic Systems, 2006, 47(4):361-382.

ZHU Y, XU G, KRIEGMAN D J. A Real-Time Approach to the Spotting,Representation,and Recognition of Hand Gestures for Human-Computer Interaction[J]. Computer Vision and Image Understanding, 2002, 85(3):189-208.

DESOUZA G N, KAK A C. Vision for Mobile Robot Navigation:A Survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(2):237-267.

LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.

李英, 田春娜, 颜建强, 等. 一种图像中的文字区域检测新方法[J]. 西安电子科技大学学报, 2013, 40(6):187-192.

LI Ying, TIAN Chunna, YAN Jianqiang, et al. New Technique for Text Region Location in Images[J]. Journal of Xidian University, 2013, 40(6):187-192.

刘晓佩, 卢朝阳, 李静. 结合WTLBP特征和SVM的复杂场景文本定位方法[J]. 西安电子科技大学学报, 2012, 39(4):103-108.

LIU Xiaopei, LU Zhaoyang, LI Jing. Complex Scene Text Location Method Based on WTLBP and SVM[J]. Journal of Xidian University, 2012, 39(4):103-108.

CHNG C K, CHAN C S. Total-Text:A Comprehensive Dataset for Scene Text Detection and Recognition[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition(ICDAR 2017).Piscataway:IEEE, 2017:935-942.

LIU Y, JIN L, ZHANG S, et al. Detecting Curve Text in the Wild:New Dataset and New Solution(2017)[J/OL].[2017-12-06].https://arxiv.org/abs/1712.02170v1.https://arxiv.org/abs/1712.02170v1https://arxiv.org/abs/1712.02170v1

REN S, HE K, GIRSHICK R B, et al. Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems 28:Annual Conference on Neural Information Processing Systems 2015. San Diego: NIPS, 2015:91-99.

HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV 2017).Piscataway:IEEE, 2017:2980-2988.

MA J, SHAO W, YE H, et al. Arbitrary-Oriented Scene Text Detection via Rotation Proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11):3111-3122.

HE T, TIAN Z, HUANG W, et al. An End-to-End TextSpotter with Explicit Alignment and Attention[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2018).Piscataway:IEEE, 2018:5020-5029.

TIAN Z, HUANG W, HE T, et al. Detecting Text in Natural Image with Connectionist Text Proposal Network[C]//Proceedings of the 14th European Conference on Computer Vision(ECCV 2016).Heidelberg:Springer, 2016:56-72.

HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780. DOI:10.1162/neco.1997.9.8.1735http://doi.org/10.1162/neco.1997.9.8.1735

LIU Y, CHEN H, SHEN C, et al. ABCNet:Real-Time Scene Text Spotting with Adaptive Bezier-Curve Network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2020).Piscataway:IEEE, 2020:9806-9815.

CARION N, MASSA F, SYNNAEVE G, et al. End-to-End Object Detection with Transformers[C]//Proceedings of the 16th European Conference on Computer Vision(ECCV 2020).Heidelberg:Springer, 2020:213-229.

VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[C]//Proceedings of the Advances in Neural Information Processing Systems 30:Annual Conference on Neural Information Processing Systems 2017. San Diego: NIPS, 2017:5998-6008.

TANG J, ZHANG W, LIU H, et al. Few Could Be Better Than All:Feature Sampling and Grouping for Scene Text Detection[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2022).Piscataway:IEEE, 2022:4553-4562.

ZHANG S, ZHU X, YANG C, et al. Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision(ICCV 2021).Piscataway:IEEE, 2021:1285-1294.

HUANG M, LIU Y, PENG Z, et al. SwinTextSpotter:Scene Text Spotting via Better Synergy between Text Detection and Text Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2022).Piscataway:IEEE, 2022:4583-4593.

HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2016).Piscataway:IEEE, 2016:770-778.

LIN T, DOLLÁR P, GIRSHICK R B, et al. Feature Pyramid Networks for Object Detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2017).Piscataway:IEEE, 2017:936-944.

TIAN Z, SHEN C, CHEN H, et al. FCOS:Fully Convolutional One-Stage Object Detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision(ICCV 2019).Piscataway:IEEE, 2019:9626-9635.

YU J, JIANG Y, WANG Z, et al. UnitBox:An Advanced Object Detection Network[C]//Proceedings of the 2016 ACM Conference on Multimedia Conference(MM 2016). New York: ACM, 2016:516-520.

BAEK Y, SHIN S, BAEK J, et al. Character Region Attention for Text Spotting[C]//Proceedings of the 16th European Conference on Computer Vision(ECCV 2020).Heidelberg:Springer, 2020:504-521.

TIAN Z, SHU M, LYU P, et al. Learning Shape-Aware Embedding for Scene Text Detection[C]//Proceedings of the Computer Vision IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2019).Piscataway:IEEE, 2019:4234-4243.

KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 Competition on Robust Reading[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition(ICDAR 2015).Piscataway:IEEE, 2015:1156-1160.

NAYEF N, YIN F, BIZID I, et al. ICDAR 2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition(ICDAR 2017).Piscataway:IEEE, 2017:1454-1459.

WANG W, XIE E, LI X, et al. Shape Robust Text Detection with Progressive Scale Expansion Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2019).Piscataway:IEEE, 2019:9336-9345.

ZHANG C, LIANG B, HUANG Z, et al. Look More Than Once:An Accurate Detector for Text of Arbitrary Shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2019).Piscataway:IEEE, 2019:10552-10561.

ZHU Y, CHEN J, LIANG L, et al. Fourier Contour Embedding for Arbitrary-Shaped Text Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2021).Piscataway:IEEE, 2021:3123-3131.

DAI P, ZHANG S, ZHANG H, et al. Progressive Contour Regression for Arbitrary-Shape Scene Text Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2021).Piscataway:IEEE, 2021:7393-7402.

BAEK Y, LEE B, HAN D, et al. Character Region Awareness for Text Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2019).Piscataway:IEEE, 2019:9365-9374.

ZHANG S, ZHU X, HOU J, et al. Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2020).Piscataway:IEEE, 2020:9696- 9705.

LIAO M, WAN Z, YAO C, et al. Real-Time Scene Text Detection with Differentiable Binarization[C]//Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence(AAAI 2020),The Thirty-Second Innovative Applications of Artificial Intelligence Conference(IAAI 2020),The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence(EAAI 2020). Palo Alto: AAAI, 2020:11474-11481.

LIU Y, ZHANG S, JIN L, et al. Omnidirectional Scene Text Detection with Sequential-Free Box Discretization[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence(IJCAI 2019).Macao:IJCAI, 2019:3052-3058.

ZHU Y, DU J. TextMountain:Accurate Scene Text Detection via Instance Segmentation[J]. Pattern Recognition, 2021, 110:107336.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向带宽受限场景的高效语义通信方法

基于多尺度特征信息融合的时间序列异常检测

基于多注意力机制的纹理感知视频修复方法

多尺度卷积结合Transformer的抑郁脑电分类研究

融合全局和局部信息的实时烟雾分割算法