Document Relevance Filtering by Natural Language Processing and Machine Learning: A Multidisciplinary Case Study of Patents

被引:0
|
作者
Bridgelall, Raj [1 ]
机构
[1] North Dakota State Univ, Coll Business, Dept Transportat & Supply Chain, POB 6050, Fargo, ND 58108 USA
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
document search; supervised machine learning; unsupervised machine learning; natural language processing; latent Dirichlet allocation; non-negative matrix factorization; manifold learning; t-distributed stochastic neighbor embedding; term co-occurrence networks; RETRIEVAL;
D O I
10.3390/app15052357
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The exponential growth of patent datasets poses a significant challenge in filtering relevant documents for research and innovation. Traditional semantic search methods based on keywords often fail to capture the complexity and variability in multidisciplinary terminology, leading to inefficiencies. This study addresses the problem by systematically evaluating supervised and unsupervised machine learning (ML) techniques for document relevance filtering across five technology domains: solid-state batteries, electric vehicle chargers, connected vehicles, electric vertical takeoff and landing aircraft, and light detecting and ranging (LiDAR) sensors. The contributions include benchmarking the performance of 10 classical models. These models include extreme gradient boosting, random forest, and support vector machines; a deep artificial neural network; and three natural language processing methods: latent Dirichlet allocation, non-negative matrix factorization, and k-means clustering of a manifold-learned reduced feature dimension. Applying these methods to more than 4200 patents filtered from a database of 9.6 million patents revealed that most supervised ML models outperform the unsupervised methods. An average of seven supervised ML models achieved significantly higher precision, recall, and F1-scores across all technology domains, while unsupervised methods show variability depending on domain characteristics. These results offer a practical framework for optimizing document relevance filtering, enabling researchers and practitioners to efficiently manage large datasets and enhance innovation.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Detecting hate crimes through machine learning and natural language processing
    Salazar, Ana Ortiz
    POLICE PRACTICE AND RESEARCH, 2024,
  • [42] Distributed peer review enhanced with natural language processing and machine learning
    Wolfgang E. Kerzendorf
    Ferdinando Patat
    Dominic Bordelon
    Glenn van de Ven
    Tyler A. Pritchard
    Nature Astronomy, 2020, 4 : 711 - 717
  • [43] Distributed peer review enhanced with natural language processing and machine learning
    Kerzendorf, Wolfgang E.
    Patat, Ferdinando
    Bordelon, Dominic
    van de Ven, Glenn
    Pritchard, Tyler A.
    NATURE ASTRONOMY, 2020, 4 (07) : 711 - 717
  • [44] Arabic Natural Language Processing and Machine Learning-Based Systems
    Marie-Sainte, Souad Larabi
    Alalyani, Nada
    Alotaibi, Sihaam
    Ghouzali, Sanaa
    Abunadi, Ibrahim
    IEEE ACCESS, 2019, 7 : 7011 - 7020
  • [45] SmishGuard: Leveraging Machine Learning and Natural Language Processing for Smishing Detection
    Samad, Saleem Raja Abdul
    Ganesan, Pradeepa
    Rajasekaran, Justin
    Radhakrishnan, Madhubala
    Ammaippan, Hariraman
    Ramamurthy, Vinodhini
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 586 - 593
  • [46] Detecting Phishing Attacks Using Natural Language Processing And Machine Learning
    Banu, Reshma
    Anand, M.
    Kamath, Akshatha C.
    Ashika, S.
    Ujwala, H. S.
    Harshitha, S. N.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1210 - 1214
  • [47] SmartFund: Predicting Research Outcomes with Machine Learning and Natural Language Processing
    Alaphat, Alvin
    Jiang, Meng
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2857 - 2865
  • [48] Subjective Answers Evaluation Using Machine Learning and Natural Language Processing
    Bashir, Muhammad Farrukh
    Arshad, Hamza
    Javed, Abdul Rehman
    Kryvinska, Natalia
    Band, Shahab S.
    IEEE ACCESS, 2021, 9 : 158972 - 158983
  • [49] Applying machine learning and natural language processing to detect phishing email
    Alhogail, Areej
    Alsabih, Afrah
    COMPUTERS & SECURITY, 2021, 110
  • [50] Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review
    Houssein, Essam H.
    Mohamed, Rehab E.
    Ali, Abdelmgeid A.
    IEEE ACCESS, 2021, 9 : 140628 - 140653