Document Relevance Filtering by Natural Language Processing and Machine Learning: A Multidisciplinary Case Study of Patents

被引:0
|
作者
Bridgelall, Raj [1 ]
机构
[1] North Dakota State Univ, Coll Business, Dept Transportat & Supply Chain, POB 6050, Fargo, ND 58108 USA
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
document search; supervised machine learning; unsupervised machine learning; natural language processing; latent Dirichlet allocation; non-negative matrix factorization; manifold learning; t-distributed stochastic neighbor embedding; term co-occurrence networks; RETRIEVAL;
D O I
10.3390/app15052357
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The exponential growth of patent datasets poses a significant challenge in filtering relevant documents for research and innovation. Traditional semantic search methods based on keywords often fail to capture the complexity and variability in multidisciplinary terminology, leading to inefficiencies. This study addresses the problem by systematically evaluating supervised and unsupervised machine learning (ML) techniques for document relevance filtering across five technology domains: solid-state batteries, electric vehicle chargers, connected vehicles, electric vertical takeoff and landing aircraft, and light detecting and ranging (LiDAR) sensors. The contributions include benchmarking the performance of 10 classical models. These models include extreme gradient boosting, random forest, and support vector machines; a deep artificial neural network; and three natural language processing methods: latent Dirichlet allocation, non-negative matrix factorization, and k-means clustering of a manifold-learned reduced feature dimension. Applying these methods to more than 4200 patents filtered from a database of 9.6 million patents revealed that most supervised ML models outperform the unsupervised methods. An average of seven supervised ML models achieved significantly higher precision, recall, and F1-scores across all technology domains, while unsupervised methods show variability depending on domain characteristics. These results offer a practical framework for optimizing document relevance filtering, enabling researchers and practitioners to efficiently manage large datasets and enhance innovation.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] A digital analysis system of patents integrating natural language processing and machine learning
    Song, Kai
    Ran, Congjing
    Yang, Le
    TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT, 2024, 36 (03) : 440 - 456
  • [3] Knowledgeable Machine Learning for Natural Language Processing
    Han, Xu
    Zhang, Zhengyan
    Liu, Zhiyuan
    COMMUNICATIONS OF THE ACM, 2021, 64 (11) : 50 - 51
  • [4] Machine learning in statistical natural language processing
    Mochihashi, Daichi
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2015, 69 (02): : 131 - 135
  • [5] Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering
    Bacanin, Nebojsa
    Zivkovic, Miodrag
    Stoean, Catalin
    Antonijevic, Milos
    Janicijevic, Stefana
    Sarac, Marko
    Strumberger, Ivana
    MATHEMATICS, 2022, 10 (22)
  • [6] Artificial learning companionusing machine learning and natural language processing
    R. Pugalenthi
    A Prabhu Chakkaravarthy
    J Ramya
    Samyuktha Babu
    R. Rasika Krishnan
    International Journal of Speech Technology, 2021, 24 : 553 - 560
  • [7] Artificial learning companionusing machine learning and natural language processing
    Pugalenthi, R.
    Prabhu Chakkaravarthy, A.
    Ramya, J.
    Babu, Samyuktha
    Rasika Krishnan, R.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 553 - 560
  • [8] Machine learning for natural language processing (and vice versa?)
    Cardie, C
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 2 - 2
  • [9] Special Issue on Machine Learning and Natural Language Processing
    Mozgovoy, Maxim
    Montero, Calkin Suero
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [10] Quantum machine learning for natural language processing application
    Pandey, Shyambabu
    Basisth, Nihar Jyoti
    Sachan, Tushar
    Kumari, Neha
    Pakray, Partha
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 627