Fine-Tuned BERT Algorithm-Based Automatic Query Expansion for Enhancing Document Retrieval System

被引:0
|
作者
Vishwakarma, Deepak [1 ,3 ]
Kumar, Suresh [2 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Delhi 110078, India
[2] Netaji Subhas Univ Technol NSUT, Dept Comp Sci & Engn, East Campus, New Delhi 110031, India
[3] KIET Grp Inst, Dept Informat Technol, Ghaziabad 201206, India
关键词
A fine-tuned BERT; Automatic query expansion; Embedding augmentation (EA); Co-occurrence statistical information; Frilled lizard optimization; Tokenization; Normalization; Splitting;
D O I
10.1007/s12559-024-10354-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online retrieval systems are mostly web-based, which makes document collecting more dynamic or fluid than in traditional information retrieval systems. With the web growing in size every day, finding meaningful information on it using a search query consisting of only a few keywords which has become increasingly difficult. One important factor in making Internet searches better is query expansion, or QE. Manual query expansion method involves the user adding terms to the query, which takes a long time but produces good results. However, the automatic query expansion (AQE) method determines the best statements with minimal time consumption. Therefore, to improve document retrieval system, a fine-tuned BERT algorithm is developed for automatic query expansion. Initially, the input text was augmented using embedding augmentation (EA) approach. The augmented text was pre-processed using tokenization, normalization, splitting, stemming, stop word removal, as well as lemmatization. Then extracting the technical keywords from the pre-processed text using co-occurrence statistical information. After extracting the keywords, a fine-tuned BERT model is utilized for expanding the query to improve document retrieval system. The hyper parameters present in the BERT was tuned using frilled lizard optimization to enhance the performance of the BERT model. Proposed model provides 92% accuracy, 95% precision, and 95.6% recall. Thus, a fine-tuned BERT model minimizing query-document mismatch and thereby improving retrieval performance.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Query expansion for answer document retrieval in Chinese Question answering system
    Yu, ZT
    Zheng, ZY
    Tang, SP
    Guo, JY
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 72 - 77
  • [22] An automatic query expansion based on hybrid CMO-COOT algorithm for optimized information retrieval
    Abdullah Saleh Alqahtani
    P. Saravanan
    M. Maheswari
    Sami Alshmrany
    The Journal of Supercomputing, 2022, 78 : 8625 - 8643
  • [23] An automatic query expansion based on hybrid CMO-COOT algorithm for optimized information retrieval
    Alqahtani, Abdullah Saleh
    Saravanan, P.
    Maheswari, M.
    Alshmrany, Sami
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 8625 - 8643
  • [24] Towards Annotation-Based Query and Document Expansion for Image Retrieval
    Escalante, Hugo Jair
    Hernandez, Carlos
    Lopez, Aurelio
    Marin, Heidy
    Montes, Manuel
    Morales, Eduardo
    Sucar, Enrique
    Villasenor, Luis
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 546 - 553
  • [25] Contextual Embeddings based on Fine-tuned Urdu-BERT for Urdu threatening content and target identification
    Malik, Muhammad Shahid Iqbal
    Cheema, Uswa
    Ignatov, Dmitry I.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (07)
  • [26] Exploring the Evolution of Sentiment in Spanish Pandemic Tweets: A Data Analysis Based on a Fine-Tuned BERT Architecture
    Miranda, Carlos Henriquez
    Sanchez-Torres, German
    Salcedo, Dixon
    DATA, 2023, 8 (06)
  • [27] Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model
    Geetha M.P.
    Karthika Renuka D.
    International Journal of Intelligent Networks, 2021, 2 : 64 - 69
  • [28] DeepSignature: fine-tuned transfer learning based signature verification system
    Saeeda Naz
    Kiran Bibi
    Riaz Ahmad
    Multimedia Tools and Applications, 2022, 81 : 38113 - 38122
  • [29] Transformer based Contextual Model for Sentiment Analysis of Customer Reviews: A Fine-tuned BERT A Sequence Learning BERT Model for Sentiment Analysis
    Durairaj, Ashok Kumar
    Chinnalagu, Anandan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (11) : 474 - 480
  • [30] Fine-Tuned Understanding: Enhancing Social Bot Detection With Transformer-Based Classification
    Sallah, Amine
    Alaoui, El Arbi Abdellaoui
    Agoujil, Said
    Wani, Mudasir Ahmad
    Hammad, Mohamed
    Maleh, Yassine
    Abd El-Latif, Ahmed A.
    IEEE ACCESS, 2024, 12 : 118250 - 118269