Fine-Tuned BERT Algorithm-Based Automatic Query Expansion for Enhancing Document Retrieval System

被引：0

作者：

Vishwakarma, Deepak ^{[1
,3
]}

Kumar, Suresh ^{[2
]}

机构：

[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Delhi 110078, India

[2] Netaji Subhas Univ Technol NSUT, Dept Comp Sci & Engn, East Campus, New Delhi 110031, India

[3] KIET Grp Inst, Dept Informat Technol, Ghaziabad 201206, India

来源：

COGNITIVE COMPUTATION | 2025年 / 17卷 / 01期

关键词：

A fine-tuned BERT; Automatic query expansion; Embedding augmentation (EA); Co-occurrence statistical information; Frilled lizard optimization; Tokenization; Normalization; Splitting;

D O I：

10.1007/s12559-024-10354-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Online retrieval systems are mostly web-based, which makes document collecting more dynamic or fluid than in traditional information retrieval systems. With the web growing in size every day, finding meaningful information on it using a search query consisting of only a few keywords which has become increasingly difficult. One important factor in making Internet searches better is query expansion, or QE. Manual query expansion method involves the user adding terms to the query, which takes a long time but produces good results. However, the automatic query expansion (AQE) method determines the best statements with minimal time consumption. Therefore, to improve document retrieval system, a fine-tuned BERT algorithm is developed for automatic query expansion. Initially, the input text was augmented using embedding augmentation (EA) approach. The augmented text was pre-processed using tokenization, normalization, splitting, stemming, stop word removal, as well as lemmatization. Then extracting the technical keywords from the pre-processed text using co-occurrence statistical information. After extracting the keywords, a fine-tuned BERT model is utilized for expanding the query to improve document retrieval system. The hyper parameters present in the BERT was tuned using frilled lizard optimization to enhance the performance of the BERT model. Proposed model provides 92% accuracy, 95% precision, and 95.6% recall. Thus, a fine-tuned BERT model minimizing query-document mismatch and thereby improving retrieval performance.

引用

页数：16

共 50 条

[31] DeepSignature: fine-tuned transfer learning based signature verification system
Naz, Saeeda
Bibi, Kiran
Ahmad, Riaz
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 38113 - 38122
[32] Information-retrieval algorithm based on query expansion and classification
Yue, Wen
Chen, Zhi-Ping
Lin, Ya-Ping
Xitong Fangzhen Xuebao / Journal of System Simulation, 2006, 18 (07): : 1926 - 1929
[33] Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization
Abolghasemi, Amin
Verberne, Suzan
Azzopardi, Leif
ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 3 - 12
[34] Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound-Related Articles
Panagides, Reanna K.
Fu, Sean H.
Jung, Skye H.
Singh, Abhishek
Muttikkal, Rose T. Eluvathingal
Broad, R. Michael
Meakem, Timothy D.
Hamilton, Rick A.
AI, 2024, 5 (03) : 1670 - 1683
[35] VULREM: Fine-Tuned BERT-Based Source-Code Potential Vulnerability Scanning System to Mitigate Attacks in Web Applications
Gurfidan, Remzi
APPLIED SCIENCES-BASEL, 2024, 14 (21):
[36] A new query expansion method for document retrieval based on the inference of fuzzy rules
Chang, Yu-Chuan
Chen, Shyi-Ming
Liau, Churn-Jung
JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2007, 30 (03) : 511 - 515
[37] Convolutional Fine-Tuned Threshold Adaboost approach for effectual content-based image retrieval
Cep, Robert
Elangovan, Muniyandy
Ramesh, Janjhyam Venkata Naga
Chohan, Mandeep Kaur
Verma, Amit
SCIENTIFIC REPORTS, 2025, 15 (01):
[38] BASHEXPLAINER: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT
Yu, Chi
Yang, Guang
Chen, Xiang
Liu, Ke
Zhou, Yanlin
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 82 - 93
[39] HIJLI_JU at SemEval-2024 Task 7: Enhancing Quantitative Question Answering Using Fine-tuned BERT Models
Sengupta, Partha Sarathi
Sarkar, Sandip
Das, Dipankar
PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 279 - 284
[40] AECR: Automatic attack technique intelligence extraction based on fine-tuned large language model
Chen, Minghao
Zhu, Kaijie
Lu, Bin
Li, Ding
Yuan, Qingjun
Zhu, Yuefei
COMPUTERS & SECURITY, 2025, 150

← 1 2 3 4 5 →