Incremental information extraction using tree-based context representations

被引:0
|
作者
Siefkes, C [1 ]
机构
[1] Free Univ Berlin, Berlin Brandenburg Grad Sch Distributed Informat, Database & Informat Syst Grp, D-14195 Berlin, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic processing. Providing annotated training data to adapt a trainable IE system to a new domain requires a considerable amount of work. To address this, we explore incremental learning. Here training documents are annotated sequentially by a user and immediately incorporated into the extraction model. Thus the system can support the user by proposing extractions based on the current extraction model, reducing the workload of the user over time. We introduce an approach to modeling IE as a token classification task that allows incremental training. To provide sufficient information to the token classifiers, we use rich, tree-based context representations of each token as feature vectors. These representations make use of the heuristically deduced document structure in addition to linguistic and semantic information. We consider the resulting feature vectors as ordered and combine proximate features into more expressive joint features, called "Orthogonal Sparse Bigrams" (OSB). Our results indicate that this setup makes it possible to employ IE in an incremental fashion without a serious performance penalty.
引用
收藏
页码:510 / 521
页数:12
相关论文
共 50 条
  • [31] Model extraction using context information
    Duarte, Lucio Mauro
    Kramer, Jeff
    Uchitel, Sebastian
    MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, PROCEEDINGS, 2006, 4199 : 380 - 394
  • [32] A tree-based Mergesort
    Moffat, A
    Petersson, O
    Wormald, NC
    ACTA INFORMATICA, 1998, 35 (09) : 775 - 793
  • [33] The Predictability of Tree-based Machine Learning Algorithms in the Big Data Context
    Qolipour, F.
    Ghasemzadeh, M.
    Mohammad-Karimi, N.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2021, 34 (01): : 82 - 89
  • [34] A Tree-based Mergesort
    Alistair Moffat
    Ola Petersson
    Nicholas C. Wormald
    Acta Informatica, 1998, 35 : 775 - 793
  • [35] STEM: a suffix tree-based method for web data records extraction
    Yixiang Fang
    Xiaoqin Xie
    Xiaofeng Zhang
    Reynold Cheng
    Zhiqiang Zhang
    Knowledge and Information Systems, 2018, 55 : 305 - 331
  • [36] STEM: a suffix tree-based method for web data records extraction
    Fang, Yixiang
    Xie, Xiaoqin
    Zhang, Xiaofeng
    Cheng, Reynold
    Zhang, Zhiqiang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 55 (02) : 305 - 331
  • [37] Tree-Based Algorithms and Incremental Feature Optimization for Fault Detection and Diagnosis in Photovoltaic Systems
    Chahine, Khaled
    ENG, 2025, 6 (01):
  • [38] A decision tree-based classification approach to rule extraction for security analysis
    Ren, N
    Zargham, M
    Rahimi, S
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2006, 5 (01) : 227 - 240
  • [39] Deforming Garment Classification With Shallow Temporal Extraction and Tree-Based Fusion
    Huang, Li
    Yang, Tong
    Jiang, Rongxin
    Tian, Xiang
    Zhou, Fan
    Chen, Yaowu
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (02) : 1114 - 1121
  • [40] Comparing performance of non–tree-based and tree-based association mapping methods
    Katherine L. Thompson
    David W. Fardo
    BMC Proceedings, 10 (Suppl 7)