Document Relevance Filtering by Natural Language Processing and Machine Learning: A Multidisciplinary Case Study of Patents

被引:0
|
作者
Bridgelall, Raj [1 ]
机构
[1] North Dakota State Univ, Coll Business, Dept Transportat & Supply Chain, POB 6050, Fargo, ND 58108 USA
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
document search; supervised machine learning; unsupervised machine learning; natural language processing; latent Dirichlet allocation; non-negative matrix factorization; manifold learning; t-distributed stochastic neighbor embedding; term co-occurrence networks; RETRIEVAL;
D O I
10.3390/app15052357
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The exponential growth of patent datasets poses a significant challenge in filtering relevant documents for research and innovation. Traditional semantic search methods based on keywords often fail to capture the complexity and variability in multidisciplinary terminology, leading to inefficiencies. This study addresses the problem by systematically evaluating supervised and unsupervised machine learning (ML) techniques for document relevance filtering across five technology domains: solid-state batteries, electric vehicle chargers, connected vehicles, electric vertical takeoff and landing aircraft, and light detecting and ranging (LiDAR) sensors. The contributions include benchmarking the performance of 10 classical models. These models include extreme gradient boosting, random forest, and support vector machines; a deep artificial neural network; and three natural language processing methods: latent Dirichlet allocation, non-negative matrix factorization, and k-means clustering of a manifold-learned reduced feature dimension. Applying these methods to more than 4200 patents filtered from a database of 9.6 million patents revealed that most supervised ML models outperform the unsupervised methods. An average of seven supervised ML models achieved significantly higher precision, recall, and F1-scores across all technology domains, while unsupervised methods show variability depending on domain characteristics. These results offer a practical framework for optimizing document relevance filtering, enabling researchers and practitioners to efficiently manage large datasets and enhance innovation.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Machine learning in medicine: a practical introduction to natural language processing
    Harrison, Conrad J.
    Sidey-Gibbons, Chris J.
    BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
  • [22] Application of Natural Language Processing and Machine Learning to Radiology Reports
    Jeon, Seoungdeok
    Colburn, Zachary
    Sakai, Joshua
    Hung, Ling-Hong
    Yeung, Ka Yee
    12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
  • [23] Automotive fault nowcasting with machine learning and natural language processing
    Pavlopoulos, John
    Romell, Alv
    Curman, Jacob
    Steinert, Olof
    Lindgren, Tony
    Borg, Markus
    Randl, Korbinian
    MACHINE LEARNING, 2024, 113 (02) : 843 - 861
  • [24] Machine learning in medicine: a practical introduction to natural language processing
    Conrad J. Harrison
    Chris J. Sidey-Gibbons
    BMC Medical Research Methodology, 21
  • [25] Railroad accident analysis by machine learning and natural language processing
    Bridgelall, Raj
    Tolliver, Denver D.
    JOURNAL OF RAIL TRANSPORT PLANNING & MANAGEMENT, 2024, 29
  • [26] Automotive fault nowcasting with machine learning and natural language processing
    John Pavlopoulos
    Alv Romell
    Jacob Curman
    Olof Steinert
    Tony Lindgren
    Markus Borg
    Korbinian Randl
    Machine Learning, 2024, 113 : 843 - 861
  • [27] Measuring college students’ multidisciplinary learning: a novel application of natural language processing
    Yuan Chih Fu
    Jin Hua Chen
    Kai Chieh Cheng
    Xuan Fen Yuan
    Higher Education, 2024, 87 : 859 - 879
  • [28] Measuring college students' multidisciplinary learning: a novel application of natural language processing
    Fu, Yuan Chih
    Chen, Jin Hua
    Cheng, Kai Chieh
    Yuan, Xuan Fen
    HIGHER EDUCATION, 2024, 87 (04) : 859 - 879
  • [29] Natural language processing and machine learning to assist radiation oncology incident learning
    Mathew, Felix
    Wang, Hui
    Montgomery, Logan
    Kildea, John
    MEDICAL PHYSICS, 2021, 48 (08) : 4704 - 4705
  • [30] Natural language processing and machine learning to assist radiation oncology incident learning
    Mathew, Felix
    Wang, Hui
    Montgomery, Logan
    Kildea, John
    JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2021, 22 (11): : 172 - 184