Active Learning with Adaptive Density Weighted Sampling for Information Extraction from Scientific Papers

被引:3
|
作者
Suvorov, Roman [1 ]
Shelmanov, Artem [1 ]
Smirnov, Ivan [1 ]
机构
[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, Moscow, Russia
基金
俄罗斯基础研究基金会;
关键词
Information extraction; Deep linguistic analysis; Active machine learning; Scientific texts analysis;
D O I
10.1007/978-3-319-71746-3_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper addresses the task of information extraction from scientific literature with machine learning methods. In particular, the tasks of definition and result extraction from scientific publications in Russian are considered. We note that annotation of scientific texts for creation of training dataset is very labor insensitive and expensive process. To tackle this problem, we propose methods and tools based on active learning. We describe and evaluate a novel adaptive density-weighted sampling (ADWeS) meta-strategy for active learning. The experiments demonstrate that active learning can be a very efficient technique for scientific text mining, and the proposed meta-strategy can be beneficial for corpus annotation with strongly skewed class distribution. We also investigate informative task-independent features for information extraction from scientific texts and present an openly available tool for corpus annotation, which is equipped with ADWeS and compatible with well-known sampling strategies.
引用
收藏
页码:77 / 90
页数:14
相关论文
共 50 条
  • [31] Information Extraction from Research Papers Based on Statistical Methods
    Kavila, Selvani Deepthi
    Rani, D. Fathima
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 573 - 580
  • [32] FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction
    Nguyen, Minh Van
    Ngo, Nghia Trung
    Min, Bonan
    Nguyen, Thien Huu
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, 2022, : 131 - 139
  • [33] Information Extraction with Active Learning: A Case Study in Legal Text
    Cardellino, Cristian
    Villata, Serena
    Alonso Alemany, Laura
    Cabrio, Elena
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 483 - 494
  • [34] Extracting Funder Information from Scientific Papers - Experiences with Question Answering
    Borst, Timo
    Mielck, Jonas
    Nannt, Matthias
    Riese, Wolfgang
    [J]. LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES (TPDL 2022), 2022, 13541 : 289 - 296
  • [35] Ensemble learning for keyphrases extraction from scientific document
    Wang, Jiabing
    Peng, Hong
    Hu, Jing-song
    Zhang, Jun
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1267 - 1272
  • [36] Automatic extraction and learning of keyphrases from scientific articles
    HaCohen-Kerner, Y
    Gross, Z
    Masa, A
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 657 - 669
  • [37] AXCELL: Automatic Extraction of Results from Machine Learning Papers
    Kardas, Marcin
    Czapla, Piotr
    Stenetorp, Pontus
    Ruder, Sebastian
    Riedel, Sebastian
    Taylor, Ross
    Stojnic, Robert
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8580 - 8594
  • [38] Adaptive Fit Parameters Tuning with Data Density Changes in Locally Weighted Learning
    Lei, Han
    Qing, Xie Kun
    Jie, Song Guo
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 2, PROCEEDINGS, 2010, 6064 : 408 - 415
  • [39] Hybrid density-based adaptive weighted collaborative representation for imbalanced learning
    Li, Yanting
    Wang, Shuai
    Jin, Junwei
    Tao, Hongwei
    Han, Chuang
    Chen, C. L. Philip
    [J]. APPLIED INTELLIGENCE, 2024, 54 (05) : 4334 - 4351
  • [40] Hybrid density-based adaptive weighted collaborative representation for imbalanced learning
    Yanting Li
    Shuai Wang
    Junwei Jin
    Hongwei Tao
    Chuang Han
    C. L. Philip Chen
    [J]. Applied Intelligence, 2024, 54 : 4334 - 4351