Fine-Tuned BERT Algorithm-Based Automatic Query Expansion for Enhancing Document Retrieval System

被引:0
|
作者
Vishwakarma, Deepak [1 ,3 ]
Kumar, Suresh [2 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Delhi 110078, India
[2] Netaji Subhas Univ Technol NSUT, Dept Comp Sci & Engn, East Campus, New Delhi 110031, India
[3] KIET Grp Inst, Dept Informat Technol, Ghaziabad 201206, India
关键词
A fine-tuned BERT; Automatic query expansion; Embedding augmentation (EA); Co-occurrence statistical information; Frilled lizard optimization; Tokenization; Normalization; Splitting;
D O I
10.1007/s12559-024-10354-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online retrieval systems are mostly web-based, which makes document collecting more dynamic or fluid than in traditional information retrieval systems. With the web growing in size every day, finding meaningful information on it using a search query consisting of only a few keywords which has become increasingly difficult. One important factor in making Internet searches better is query expansion, or QE. Manual query expansion method involves the user adding terms to the query, which takes a long time but produces good results. However, the automatic query expansion (AQE) method determines the best statements with minimal time consumption. Therefore, to improve document retrieval system, a fine-tuned BERT algorithm is developed for automatic query expansion. Initially, the input text was augmented using embedding augmentation (EA) approach. The augmented text was pre-processed using tokenization, normalization, splitting, stemming, stop word removal, as well as lemmatization. Then extracting the technical keywords from the pre-processed text using co-occurrence statistical information. After extracting the keywords, a fine-tuned BERT model is utilized for expanding the query to improve document retrieval system. The hyper parameters present in the BERT was tuned using frilled lizard optimization to enhance the performance of the BERT model. Proposed model provides 92% accuracy, 95% precision, and 95.6% recall. Thus, a fine-tuned BERT model minimizing query-document mismatch and thereby improving retrieval performance.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] DeepSignature: fine-tuned transfer learning based signature verification system
    Naz, Saeeda
    Bibi, Kiran
    Ahmad, Riaz
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 38113 - 38122
  • [32] Information-retrieval algorithm based on query expansion and classification
    Yue, Wen
    Chen, Zhi-Ping
    Lin, Ya-Ping
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2006, 18 (07): : 1926 - 1929
  • [33] Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization
    Abolghasemi, Amin
    Verberne, Suzan
    Azzopardi, Leif
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 3 - 12
  • [34] Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound-Related Articles
    Panagides, Reanna K.
    Fu, Sean H.
    Jung, Skye H.
    Singh, Abhishek
    Muttikkal, Rose T. Eluvathingal
    Broad, R. Michael
    Meakem, Timothy D.
    Hamilton, Rick A.
    AI, 2024, 5 (03) : 1670 - 1683
  • [35] VULREM: Fine-Tuned BERT-Based Source-Code Potential Vulnerability Scanning System to Mitigate Attacks in Web Applications
    Gurfidan, Remzi
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [36] A new query expansion method for document retrieval based on the inference of fuzzy rules
    Chang, Yu-Chuan
    Chen, Shyi-Ming
    Liau, Churn-Jung
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2007, 30 (03) : 511 - 515
  • [37] Convolutional Fine-Tuned Threshold Adaboost approach for effectual content-based image retrieval
    Cep, Robert
    Elangovan, Muniyandy
    Ramesh, Janjhyam Venkata Naga
    Chohan, Mandeep Kaur
    Verma, Amit
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [38] BASHEXPLAINER: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT
    Yu, Chi
    Yang, Guang
    Chen, Xiang
    Liu, Ke
    Zhou, Yanlin
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 82 - 93
  • [39] HIJLI_JU at SemEval-2024 Task 7: Enhancing Quantitative Question Answering Using Fine-tuned BERT Models
    Sengupta, Partha Sarathi
    Sarkar, Sandip
    Das, Dipankar
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 279 - 284
  • [40] AECR: Automatic attack technique intelligence extraction based on fine-tuned large language model
    Chen, Minghao
    Zhu, Kaijie
    Lu, Bin
    Li, Ding
    Yuan, Qingjun
    Zhu, Yuefei
    COMPUTERS & SECURITY, 2025, 150