Extracting Methodology Components from AI Research Papers: A Data-driven Factored Sequence Labeling Approach

被引:1
|
作者
Ghosh, Madhusudan [1 ]
Ganguly, Debasis [2 ]
Basuchowdhuri, Partha [1 ]
Naskar, Sudip Kumar [3 ]
机构
[1] Indian Assoc Cultivat Sci, Kolkata, India
[2] Univ Glasgow, Glasgow, Scotland
[3] Jadavpur Univ, Kolkata, India
关键词
Information Extraction; Factored Model; Clustering; Scientific Literature;
D O I
10.1145/3583780.3615258
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extraction of methodology component names from scientific articles is a challenging task due to the diversified contexts around the occurrences of these entities, and the different levels of granularity and containment relationships exhibited by these entities. We hypothesize that standard sequence labeling approaches may not adequately model the dependence of methodology name mentions with their contexts, due to the problems of their large, fast evolving, and domain-specific vocabulary. As a solution, we propose a factored approach, where the mention-context dependencies are represented in a more fine-grained manner, thus allowing the model parameters to better adjust to the different characteristic patterns inherent within the data. In particular, we experiment with two variants of this factored approach - one that uses the per-entity category information derived from an ontology, and the other that makes use of the topology of the sentence embedding space to infer a category for each entity constituting that sentence. We demonstrate that both these factored variants of SciBERT outperform their non-factored counterpart, a state-of-the-art model for scientific concept extraction.
引用
收藏
页码:3897 / 3901
页数:5
相关论文
共 50 条
  • [1] Achieving a Data-Driven Risk Assessment Methodology for Ethical AI
    Anna Felländer
    Jonathan Rebane
    Stefan Larsson
    Mattias Wiggberg
    Fredrik Heintz
    Digital Society, 2022, 1 (2):
  • [2] Is Open Source the Future of AI? A Data-Driven Approach
    Vake, Domen
    Sinik, Bogdan
    Vicic, Jernej
    Tosic, Aleksandar
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [3] A Data-Driven Approach for Extracting Representative Information From Large Datasets With Mixed Attributes
    Wu, Feng
    Huang, Xin
    Jiang, Bin
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2022, 69 (05) : 1806 - 1822
  • [4] Climatic Zoning Methodology Based On Data-Driven Approach
    Mazzaferro, Leonardo
    Machado, Rayner Mauricio e Silva
    Melo, Ana Paula
    Lamberts, Roberto
    PROCEEDINGS OF BUILDING SIMULATION 2019: 16TH CONFERENCE OF IBPSA, 2020, : 3955 - 3962
  • [5] Cancer research using data-driven AI for clinical applications
    Hamamoto, Ryuji
    CANCER SCIENCE, 2024, 115 : 8 - 8
  • [6] Extracting Value from Industrial Alarms and Events: A Data-Driven Approach Based on Exploratory Data Analysis
    Bezerra, Aguinaldo
    Silva, Ivanovitch
    Guedes, Luiz Affonso
    Silva, Diego
    Leitao, Gustavo
    Saito, Kaku
    SENSORS, 2019, 19 (12)
  • [7] A Domain Independent Approach for Extracting Terms from Research Papers
    Jiang, Birong
    Xun, Endong
    Qi, Jianzhong
    DATABASES THEORY AND APPLICATIONS, 2015, 9093 : 155 - 166
  • [8] AssistantGraph: An Approach for Reusable and Composable Data-driven Assistant Components
    Meurisch, Christian
    Bayrak, Bekir
    Muehlhaeuser, Max
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 513 - 522
  • [9] OM Research: From Problem-Driven to Data-Driven Research
    Simchi-Levi, David
    M&SOM-MANUFACTURING & SERVICE OPERATIONS MANAGEMENT, 2014, 16 (01) : 2 - 10
  • [10] AI evaluation of repair data in the aviation industry – Data-driven condition diagnosis of aircraft components
    Hörstel F.
    Kähler F.
    Schüppstuh T.
    WT Werkstattstechnik, 2022, 112 (09): : 613 - 618