Recognizing software names in biomedical literature using machine learning

被引:7
|
作者
Wei, Qiang [1 ]
Zhang, Yaoyun [1 ]
Amith, Muhammad [1 ]
Lin, Rebecca [2 ]
Lapeyrolerie, Jenay [3 ]
Tao, Cui [1 ]
Xu, Hua [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
[3] Baylor Univ, Waco, TX 76798 USA
关键词
biomedical literature; biomedical software; biomedical software index; named entity recognition; natural language processing; BIOINFORMATICS; SERVICES;
D O I
10.1177/1460458219869490
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.
引用
收藏
页码:21 / 33
页数:13
相关论文
共 50 条
  • [21] Systematic literature review: machine learning for software fault prediction
    Navarro Cedeno, Gabriel Omar
    Cortes Moya, Katherine
    Somarribas Dormond, Ahmed
    Gonzalez-Torres, Antonio
    Rojas-Hernandez, Yenory
    2023 IEEE 41ST CENTRAL AMERICA AND PANAMA CONVENTION, CONCAPAN XLI, 2023, : 134 - 139
  • [22] Software fault prediction using data mining, machine learning and deep learning techniques: A systematic literature review
    Batool, Iqra
    Khan, Tamim Ahmed
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [23] Software quality prediction using machine learning
    Alaswad, Feisal
    Poovammal, E.
    MATERIALS TODAY-PROCEEDINGS, 2022, 62 : 4714 - 4720
  • [24] Software Quality Prediction Using Machine Learning
    Desai, Bhoushika
    Sungkur, Roopesh Kevin
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [25] IMPROVING SOFTWARE QUALITY USING MACHINE LEARNING
    Chandra, Kanika
    Kapoor, Gagan
    Kohli, Rashi
    Gupta, Archana
    2016 1ST INTERNATIONAL CONFERENCE ON INNOVATION AND CHALLENGES IN CYBER SECURITY (ICICCS 2016), 2016, : 115 - 118
  • [26] Software Modernization Using Machine Learning Techniques
    Somogyi, Norbert
    Kovesdan, Gabor
    2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021), 2021, : 361 - 365
  • [27] On Software Defect Prediction Using Machine Learning
    Ren, Jinsheng
    Qin, Ke
    Ma, Ying
    Luo, Guangchun
    JOURNAL OF APPLIED MATHEMATICS, 2014,
  • [28] Software Quality Prediction Using Machine Learning
    Desai, Bhoushika
    Sungkur, Roopesh Kevin
    6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 401 - 411
  • [29] A survey of mutations in biomedical literature using a machine based approach
    Koyama, Takahiko
    Rhrissorrakrai, Kahn
    Parida, Laxmi
    CANCER RESEARCH, 2017, 77
  • [30] A Systematic Literature Review on Using Machine Learning Algorithms for Software Requirements Identification on Stack Overflow
    Ahmad, Arshad
    Feng, Chong
    Khan, Muzammil
    Khan, Asif
    Ullah, Ayaz
    Nazir, Shah
    Tahir, Adnan
    SECURITY AND COMMUNICATION NETWORKS, 2020, 2020