Document Relevance Filtering by Natural Language Processing and Machine Learning: A Multidisciplinary Case Study of Patents

被引：0

作者：

Bridgelall, Raj ^{[1
]}

机构：

[1] North Dakota State Univ, Coll Business, Dept Transportat & Supply Chain, POB 6050, Fargo, ND 58108 USA

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期

关键词：

document search; supervised machine learning; unsupervised machine learning; natural language processing; latent Dirichlet allocation; non-negative matrix factorization; manifold learning; t-distributed stochastic neighbor embedding; term co-occurrence networks; RETRIEVAL;

D O I：

10.3390/app15052357

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The exponential growth of patent datasets poses a significant challenge in filtering relevant documents for research and innovation. Traditional semantic search methods based on keywords often fail to capture the complexity and variability in multidisciplinary terminology, leading to inefficiencies. This study addresses the problem by systematically evaluating supervised and unsupervised machine learning (ML) techniques for document relevance filtering across five technology domains: solid-state batteries, electric vehicle chargers, connected vehicles, electric vertical takeoff and landing aircraft, and light detecting and ranging (LiDAR) sensors. The contributions include benchmarking the performance of 10 classical models. These models include extreme gradient boosting, random forest, and support vector machines; a deep artificial neural network; and three natural language processing methods: latent Dirichlet allocation, non-negative matrix factorization, and k-means clustering of a manifold-learned reduced feature dimension. Applying these methods to more than 4200 patents filtered from a database of 9.6 million patents revealed that most supervised ML models outperform the unsupervised methods. An average of seven supervised ML models achieved significantly higher precision, recall, and F1-scores across all technology domains, while unsupervised methods show variability depending on domain characteristics. These results offer a practical framework for optimizing document relevance filtering, enabling researchers and practitioners to efficiently manage large datasets and enhance innovation.

引用

页数：25

共 50 条

[21] Machine learning in medicine: a practical introduction to natural language processing
Harrison, Conrad J.
Sidey-Gibbons, Chris J.
BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
[22] Application of Natural Language Processing and Machine Learning to Radiology Reports
Jeon, Seoungdeok
Colburn, Zachary
Sakai, Joshua
Hung, Ling-Hong
Yeung, Ka Yee
12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
[23] Automotive fault nowcasting with machine learning and natural language processing
Pavlopoulos, John
Romell, Alv
Curman, Jacob
Steinert, Olof
Lindgren, Tony
Borg, Markus
Randl, Korbinian
MACHINE LEARNING, 2024, 113 (02) : 843 - 861
[24] Machine learning in medicine: a practical introduction to natural language processing
Conrad J. Harrison
Chris J. Sidey-Gibbons
BMC Medical Research Methodology, 21
[25] Railroad accident analysis by machine learning and natural language processing
Bridgelall, Raj
Tolliver, Denver D.
JOURNAL OF RAIL TRANSPORT PLANNING & MANAGEMENT, 2024, 29
[26] Automotive fault nowcasting with machine learning and natural language processing
John Pavlopoulos
Alv Romell
Jacob Curman
Olof Steinert
Tony Lindgren
Markus Borg
Korbinian Randl
Machine Learning, 2024, 113 : 843 - 861
[27] Measuring college students’ multidisciplinary learning: a novel application of natural language processing
Yuan Chih Fu
Jin Hua Chen
Kai Chieh Cheng
Xuan Fen Yuan
Higher Education, 2024, 87 : 859 - 879
[28] Measuring college students' multidisciplinary learning: a novel application of natural language processing
Fu, Yuan Chih
Chen, Jin Hua
Cheng, Kai Chieh
Yuan, Xuan Fen
HIGHER EDUCATION, 2024, 87 (04) : 859 - 879
[29] Natural language processing and machine learning to assist radiation oncology incident learning
Mathew, Felix
Wang, Hui
Montgomery, Logan
Kildea, John
MEDICAL PHYSICS, 2021, 48 (08) : 4704 - 4705
[30] Natural language processing and machine learning to assist radiation oncology incident learning
Mathew, Felix
Wang, Hui
Montgomery, Logan
Kildea, John
JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2021, 22 (11): : 172 - 184

← 1 2 3 4 5 →