Dual-branch dilated context convolutional for table detection transformer in the document images

被引：1

作者：

Ni, Ying ^{[1
]}

Wang, Xiaoli ^{[1
]}

Peng, Hanghang ^{[1
]}

Li, Yonzhi ^{[2
,3
]}

Wang, Jinyang ^{[2
]}

Li, Haoxuan ^{[2
]}

Huang, Jin ^{[2
,3
]}

机构：

[1] Guotai Asset Management Co Ltd, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[3] State Key Lab New Text Mat & Adv Proc Technol, Wuhan, Peoples R China

来源：

VISUAL COMPUTER | 2025年 / 41卷 / 04期

关键词：

Table detection; Transformer; Dilated convolution; Visualization;

D O I：

10.1007/s00371-024-03561-6

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

With the increasing automation of document images like financial reports, table detection has become a critical component of document automation. It requires models to extract the position information of tables in document images without losing information. However, existing techniques still fall short in detecting certain small-sized or irregularly shaped tables. To address this issue, we propose a Transformer-based table detection model. To enhance both training efficiency and prediction performance, we employ a pretrained Transformer framework for fine-tuning to effectively capture underlying features. Additionally, we integrate a dual-branch dilated context convolutional module to further improve the detection accuracy and robustness for tables of various sizes and shapes by processing high-dimensional features. Furthermore, we integrated multiple layers of residual convolutional layers to capture and fuse features at different scales, enhancing the network's ability to represent features in multi-scale feature fusion, thus enhancing the detection performance of the network. We used feature maps and heatmaps for visualization to verify the reliability of our method. We evaluate our method on publicly available document datasets, and the results demonstrate that our approach achieves more advanced performance in evaluation metrics such as Precision. https://github.com/GT-HZ/TD

引用

页码：2709 / 2720

页数：12

共 50 条

[1] Pre-training transformer with dual-branch context content module for table detection in document images
Yongzhi LI
Pengle ZHANG
Meng SUN
Jin HUANG
Ruhan HE
虚拟现实与智能硬件(中英文), 2024, 6 (05) : 408 - 420
[2] Pre-training transformer with dual-branch context content module for table detection in document images
Li, Yongzhi
Zhang, Pengle
Sun, Meng
Huang, Jin
He, Ruhan
Virtual Reality and Intelligent Hardware, 2024, 6 (05): : 408 - 420
[3] Dual-Branch Adaptive Convolutional Transformer for Hyperspectral Image Classification
Wang, Chuanzhi
Huang, Jun
Lv, Mingyun
Wu, Yongmei
Qin, Ruiru
REMOTE SENSING, 2024, 16 (09)
[4] Dual-branch collaborative transformer for effective
Qi, Xuanyu
Song, Tianyu
Dong, Haobo
Jin, Jiyu
Jin, Guiyue
Li, Pengpeng
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
[5] Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer
Wang, Qingyan
Zhou, Binbin
Zhang, Junping
Xie, Jinbao
Wang, Yujing
SENSORS, 2024, 24 (03)
[6] HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
Kasem, Mahmoud SalahEldin
Mahmoud, Mohamed
Yagoub, Bilel
Senussi, Mostafa Farouk
Abdalla, Mahmoud
Kang, Hyun-Soo
MATHEMATICS, 2025, 13 (02)
[7] Ship Recognition for Complex SAR Images via Dual-Branch Transformer Fusion Network
Sun, Zhongzhen
Leng, Xiangguang
Zhang, Xianghui
Xiong, Boli
Ji, Kefeng
Kuang, Gangyao
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[8] Dual-Branch Fourier-Mixing Transformer Network for Hyperspectral Target Detection
Jiao, Jinyue
Gong, Zhiqiang
Zhong, Ping
REMOTE SENSING, 2023, 15 (19)
[9] CosineTR: A dual-branch transformer-based network for semantic line detection
Zhang, Yuqi
Ma, Bole
Jin, Luyang
Yang, Yuancheng
Tong, Chao
PATTERN RECOGNITION, 2025, 158
[10] Dual-Branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images
He, Dongzhi
Ma, Zeyuan
Li, Chenxi
Li, Yunqi
IEEE ACCESS, 2024, 12 : 125654 - 125667

← 1 2 3 4 5 →