A hybrid machine learning approach for information extraction from free text

被引:1
|
作者
Neumann, G [1 ]
机构
[1] DFKI Saarbrucken, LT Lab, D-66123 Saarbrucken, Germany
关键词
D O I
10.1007/3-540-31314-1_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modeling (MEM), and a classifier based on our work on Data-Oriented Parsing (DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.2% F-measure using the hybrid approach, compared to 79.3% for MEM and 51.9% for DOP when running them in isolation.
引用
收藏
页码:390 / 397
页数:8
相关论文
共 50 条
  • [1] A machine learning approach to information extraction
    Téllez-Valero, A
    Montes-y-Gómez, M
    Villaseñor-Pineda, L
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 539 - 547
  • [2] A Hybrid Approach for Spatial Information Extraction from Natural Language Text
    Hassini, Nesrine
    Mahmoudi, Khaoula
    Faiz, Sami
    [J]. 2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [3] Information extraction and classification from free text using a neural approach
    Gallo, Ignazio
    Binagbi, Elisabetta
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 : 921 - 929
  • [4] Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach
    Viscosi, Carmelo
    Fidelbo, Paolo
    Benedetto, Andrea
    Varvara, Massimo
    Ferrante, Margherita
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2022, 160
  • [5] A novel machine learning approach for scene text extraction
    Ansari, Ghulam Jillani
    Shah, Jamal Hussain
    Yasmin, Mussarat
    Sharif, Muhammad
    Fernandes, Steven Lawrence
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 328 - 340
  • [6] Information Extraction and Machine Learning: Auto-Marking Short Free Text Responses to Science Questions
    Sukkarieh, Jana Z.
    Pulman, Stephen G.
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION: SUPPORTING LEARNING THROUGH INTELLIGENT AND SOCIALLY INFORMED TECHNOLOGY, 2005, 125 : 629 - 637
  • [7] Combining Relations for Information Extraction from Free Text
    Maslennikov, Mstislav
    Chua, Tat-Seng
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 28 (03)
  • [8] A Supervised Machine Learning Approach for Temporal Information Extraction
    Kolya, Anup Kumar
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    [J]. PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 447 - 454
  • [9] A maximum entropy approach to Information Extraction from semi-structured and free text
    Chien, HL
    Ng, HT
    [J]. EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 786 - 791
  • [10] Hybrid approach for text categorization based on machine learning and rules
    Villena-Roman, Julio
    Collada-Perez, Sonia
    Lana-Serrano, Sara
    Carlos Gonzalez-Cristobal, Jose
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (46): : 35 - 42