Recognition techniques for extracting information from semi-structured documents

被引:0
|
作者
Della Ventura, A [1 ]
Gagliardi, I [1 ]
Zonta, B [1 ]
机构
[1] CNR, ITIM, I-20131 Milan, Italy
来源
关键词
OCR; automatic indexing; information retrieval; document analysis; image analysis; pattern matching; linguistic analysis;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
引用
收藏
页码:130 / 137
页数:8
相关论文
共 50 条
  • [41] Header metadata extraction from semi-structured documents using template matching
    Huang, Zewu
    Jin, Hai
    Yuan, Pingpeng
    Han, Zongfen
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: OTM 2006 WORKSHOPS, PT 2, PROCEEDINGS, 2006, 4278 : 1776 - +
  • [42] Semi-structured document image matching and recognition
    Augereau, Olivier
    Journet, Nicholas
    Domenger, Jean-Philippe
    DOCUMENT RECOGNITION AND RETRIEVAL XX, 2013, 8658
  • [43] A document model based on relevance modeling techniques for semi-structured information warehouses
    Pérez, JM
    Berlanga, R
    Aramburu, MJ
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, 3180 : 318 - 327
  • [44] Mining Entities and their Values from Semi-Structured Documents in Business Process Outsourcing
    Guggilla, Chinnappa
    Pandey, Ankit G.
    Kummamuru, Krishna
    Shivaram, Madhura
    PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 283 - 288
  • [45] An Automatic Ontology Population with a Machine Learning Technique from Semi-Structured Documents
    Song, Hyun-Je
    Park, Seong-Bae
    Park, Se-Young
    ICIA: 2009 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-3, 2009, : 519 - 524
  • [46] Managing unstructured and semi-structured information in organisations
    Aitken, Ashley M.
    6th IEEE/ACIS International Conference on Computer and Information Science, Proceedings, 2007, : 712 - 717
  • [47] WebDP: Understanding Discourse Structures in Semi-Structured Web Documents
    Liu, Peilin
    Lin, Hongyu
    Liao, Meng
    Xiang, Hao
    Han, Xianpei
    Sun, Le
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10235 - 10258
  • [48] OLERA: OnLine extraction rule analysis for semi-structured documents
    Chang, CH
    Kuo, SC
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, VOLS 1AND 2, 2004, : 736 - 742
  • [49] Joint Distributed Representation of Text and Structure of Semi-Structured Documents
    Laddha, Abhishek
    Joshi, Salil
    Shaikh, Samiulla
    Mehta, Sameep
    HT'18: PROCEEDINGS OF THE 29TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA, 2018, : 25 - 32
  • [50] Clustering method via independent components for semi-structured documents
    Wang, Tong
    Liu, Da-Xin
    Lin, Xuanzuo
    Sun, Wei
    DATA MINING, INTRUSION DETECTION, INFORMATION ASSURANCE, AND DATA NETWORKS SECURITY 2006, 2006, 6241