A large-scale dataset for end-to-end table recognition in the wild

被引:2
|
作者
Yang, Fan [1 ]
Hu, Lei [1 ]
Liu, Xinwu [2 ]
Huang, Shuangping [1 ,3 ]
Gu, Zhenghui [4 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China
[2] Zhuzhou CRRC Times Elect Co Ltd, Zhuzhou 412001, Peoples R China
[3] Pazhou Lab, Guangzhou 510335, Peoples R China
[4] South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Peoples R China
关键词
D O I
10.1038/s41597-023-01985-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Table recognition (TR) is one of the research hotspots in pattern recognition, which aims to extract information from tables in an image. Common table recognition tasks include table detection (TD), table structure recognition (TSR) and table content recognition (TCR). TD is to locate tables in the image, TCR recognizes text content, and TSR recognizes spatial & ontology (logical) structure. Currently, the end-to-end TR in real scenarios, accomplishing the three sub-tasks simultaneously, is yet an unexplored research area. One major factor that inhibits researchers is the lack of a benchmark dataset. To this end, we propose a new large-scale dataset named Table Recognition Set (TabRecSet) with diverse table forms sourcing from multiple scenarios in the wild, providing complete annotation dedicated to end-to-end TR research. It is the largest and first bi-lingual dataset for end-to-end TR, with 38.1 K tables in which 20.4 K are in English and 17.7 K are in Chinese. The samples have diverse forms, such as the border-complete and -incomplete table, regular and irregular table (rotated, distorted, etc.). The scenarios are multiple in the wild, varying from scanned to camera-taken images, documents to Excel tables, educational test papers to financial invoices. The annotations are complete, consisting of the table body spatial annotation, cell spatial & logical annotation and text content for TD, TSR and TCR, respectively. The spatial annotation utilizes the polygon instead of the bounding box or quadrilateral adopted by most datasets. The polygon spatial annotation is more suitable for irregular tables that are common in wild scenarios. Additionally, we propose a visualized and interactive annotation tool named TableMe to improve the efficiency and quality of table annotation.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A large-scale dataset for end-to-end table recognition in the wild
    Fan Yang
    Lei Hu
    Xinwu Liu
    Shuangping Huang
    Zhenghui Gu
    [J]. Scientific Data, 10
  • [2] Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
    Kannan, Anjuli
    Datta, Arindrima
    Sainath, Tara N.
    Weinstein, Eugene
    Ramabhadran, Bhuvana
    Wu, Yonghui
    Bapna, Ankur
    Chen, Zhifeng
    Lee, Seungji
    [J]. INTERSPEECH 2019, 2019, : 2130 - 2134
  • [3] END-TO-END APPROACH TO LARGE-SCALE MULTIMEDIA DISSEMINATION
    YAVATKAR, R
    MANOJ, L
    [J]. COMPUTER COMMUNICATIONS, 1994, 17 (03) : 205 - 217
  • [4] Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline
    Xu, Zhenbo
    Yang, Wei
    Meng, Ajin
    Lu, Nanxue
    Huang, Huan
    Ying, Changchun
    Huang, Liusheng
    [J]. COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 261 - 277
  • [5] An end-to-end workflow pipeline for large-scale Grid computing
    McGough A.S.
    Cohen J.
    Darlington J.
    Katsiri E.
    Lee W.
    Panagiotidi S.
    Patel Y.
    [J]. Journal of Grid Computing, 2005, 3 (3-4) : 259 - 281
  • [6] SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR
    Li, Bo
    Pang, Ruoming
    Sainath, Tara N.
    Gulati, Anmol
    Zhang, Yu
    Qin, James
    Haghani, Parisa
    Huang, W. Ronny
    Ma, Min
    Bai, Junwen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1011 - 1018
  • [7] Learning an End-to-End Structure for Retrieval in Large-Scale Recommendations
    Gao, Weihao
    Fan, Xiangjun
    Wang, Chong
    Sun, Jiankai
    Jia, Kai
    Xiao, Wenzhi
    Ding, Ruofan
    Bin, Xingyan
    Yang, Hui
    Liu, Xiaobing
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 524 - 533
  • [8] On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
    Li, Jinyu
    Wu, Yu
    Gaur, Yashesh
    Wang, Chengyi
    Zhao, Rui
    Liu, Shujie
    [J]. INTERSPEECH 2020, 2020, : 1 - 5
  • [9] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
    Hou, Wenxin
    Dong, Yue
    Zhuang, Bairong
    Yang, Longfei
    Shi, Jiatong
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2020, 2020, : 1037 - 1041
  • [10] Large-scale behavior of end-to-end epidemic message loss recovery
    Özkasap, Ö
    [J]. FROM QOS PROVISIONING TO QOS CHARGING, PROCEEDINGS, 2002, 2511 : 25 - 35