A hierarchical representation of form documents for identification and retrieval

被引:25
|
作者
Pınar Duygulu
Volkan Atalay
机构
[1] Department of Computer Engineering,
[2] Middle East Technical University,undefined
[3] Ankara,undefined
[4] 06531 Turkey; e-mail: {duygulu,undefined
[5] volkan}@ceng.metu.edu.tr ,undefined
关键词
Keywords: Form document processing – Logical layout extraction – Retrieval – Data processing;
D O I
10.1007/s100320100077
中图分类号
学科分类号
摘要
In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the structure of a form by using lines and the XY-tree approach. The approach is top-down and no domain knowledge such as the preprinted data or filled-in data is used. Geometrical modifications and slight variations are handled by this representation. Logically identical forms are associated to the same or similar hierarchical structure. Identification and the retrieval of similar forms are performed by computing the edit distances between the generated trees.
引用
收藏
页码:17 / 27
页数:10
相关论文
共 50 条
  • [41] The Evolving Form of Documents
    Warnock, John E.
    DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 1 - 1
  • [42] INFORMATION RETRIEVAL FOR SHORT DOCUMENTS
    Qi Haoliang Li Mu* Gao Jianfeng** Li Sheng (Ministry of Education - Microsoft Key Laboratory of Natural Language Processing and Speech (Harbin Institute of Technology)
    Journal of Electronics(China), 2006, (06) : 933 - 936
  • [43] ANALYSIS AND RETRIEVAL OF COMPOSITE DOCUMENTS
    FOX, EA
    PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1985, 22 : 54 - 58
  • [44] EFFICIENT RETRIEVAL OF PARTIAL DOCUMENTS
    ZOBEL, J
    MOFFAT, A
    WILKINSON, R
    SACKSDAVIS, R
    INFORMATION PROCESSING & MANAGEMENT, 1995, 31 (03) : 361 - 377
  • [45] STORAGE AND RETRIEVAL OF STRUCTURED DOCUMENTS
    MACLEOD, IA
    INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (02) : 197 - 208
  • [46] Label-Enhanced Hierarchical Contextualized Representation for Sequential Metaphor Identification
    Li, Shuqun
    Yang, Liang
    He, Weidong
    Zhang, Shiqi
    Zeng, Jingjie
    Lin, Hongfei
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3533 - 3543
  • [47] Compact representation of the facial images for identification in a parallel-hierarchical network
    Kutaev, YF
    Timchenko, LI
    Gertsiy, AA
    Zahoruiko, LV
    MACHINE VISION SYSTEMS FOR INSPECTION AND METROLOGY VII, 1998, 3521 : 157 - 167
  • [48] VISUALIZATION OF AND RETRIEVAL OF BACKGROUND INFORMATION RELATING TO WORDS IN WEB DOCUMENTS A Visualization Interface based on Vector Representation
    Shimatsuka, Kouji
    Yonekura, Tatsuhiro
    WEBIST 2009: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2009, : 419 - 422
  • [49] A Topic Model for Hierarchical Documents
    Yang, Yang
    Wang, Feifei
    Jiang, Fei
    Jin, Shuyuan
    Xu, Jin
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 118 - 126
  • [50] Hierarchical clustering of text documents
    Lomakina, L. S.
    Rodionov, V. B.
    Surkova, A. S.
    AUTOMATION AND REMOTE CONTROL, 2014, 75 (07) : 1309 - 1315