Machine Learning for Intelligent Processing of Printed Documents

被引:0
|
作者
Floriana Esposito
Donato Malerba
Francesca A. Lisi
机构
[1] Università degli Studi di Bari,Dipartimento di Informatica
[2] Università degli Studi di Bari,Dipartimento di Informatica
[3] Università degli Studi di Bari,Dipartimento di Informatica
关键词
learning and knowledge discovery; intelligent information systems; intelligent document processing; decision-tree learning; first-order rule induction;
D O I
暂无
中图分类号
学科分类号
摘要
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. This article proposes the application of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated from a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classification and understanding of documents. Issues concerning the incremental induction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the proposed solutions is empirically evaluated by processing a set of real printed documents.
引用
收藏
页码:175 / 198
页数:23
相关论文
共 50 条
  • [1] Machine learning for intelligent processing of printed documents
    Esposito, F
    Malerba, D
    Lisi, FA
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2000, 14 (2-3) : 175 - 198
  • [2] Machine learning in intelligent image processing
    Tao, Dacheng
    Wang, Dianhui
    Murtagh, Fionn
    [J]. SIGNAL PROCESSING, 2013, 93 (06) : 1399 - 1400
  • [3] Intelligent document processing based on RPA and machine learning
    Ling, Xufeng
    Gao, Ming
    Wang, Dong
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1349 - 1353
  • [4] Supervised machine learning in intelligent character recognition of handwritten and printed nameplate
    Kajale, Renuka
    Das, Soubhik
    Medhekar, Paritosh
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL (ICAC3), 2017,
  • [5] Autonomous driving through intelligent image processing and machine learning
    Krödel, M
    Kuhnert, KD
    [J]. COMPUTATIONAL INTELLIGENCE: THEORY AND APPLICATIONS, PROCEEDINGS, 2001, 2206 : 712 - 718
  • [6] Dewarping Machine Printed Documents of Gurmukhi Script
    Sharma, Dharam Veer
    Wadhwa, Shilpi
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 117 - 123
  • [7] Machine-Learning-Assisted Intelligent Processing and Optimization of Complex Systems
    Luo, Xiong
    Yuan, Manman
    [J]. PROCESSES, 2023, 11 (09)
  • [8] Review on Intelligent Processing Technologies of Legal Documents
    Zhao, Guolong
    Liu, Yuling
    Erdun, E.
    [J]. ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I, 2022, 13338 : 684 - 695
  • [9] An intelligent learning machine
    Sayad, S
    Balke, ST
    Sayad, S
    [J]. DATA MINING IV, 2004, 7 : 639 - 649
  • [10] A syntactic approach for processing mathematical expressions in printed documents
    Garain, U
    Chaudhuri, BB
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 523 - 526