Fine-Tuned BERT Algorithm-Based Automatic Query Expansion for Enhancing Document Retrieval System

被引:0
|
作者
Vishwakarma, Deepak [1 ,3 ]
Kumar, Suresh [2 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Delhi 110078, India
[2] Netaji Subhas Univ Technol NSUT, Dept Comp Sci & Engn, East Campus, New Delhi 110031, India
[3] KIET Grp Inst, Dept Informat Technol, Ghaziabad 201206, India
关键词
A fine-tuned BERT; Automatic query expansion; Embedding augmentation (EA); Co-occurrence statistical information; Frilled lizard optimization; Tokenization; Normalization; Splitting;
D O I
10.1007/s12559-024-10354-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online retrieval systems are mostly web-based, which makes document collecting more dynamic or fluid than in traditional information retrieval systems. With the web growing in size every day, finding meaningful information on it using a search query consisting of only a few keywords which has become increasingly difficult. One important factor in making Internet searches better is query expansion, or QE. Manual query expansion method involves the user adding terms to the query, which takes a long time but produces good results. However, the automatic query expansion (AQE) method determines the best statements with minimal time consumption. Therefore, to improve document retrieval system, a fine-tuned BERT algorithm is developed for automatic query expansion. Initially, the input text was augmented using embedding augmentation (EA) approach. The augmented text was pre-processed using tokenization, normalization, splitting, stemming, stop word removal, as well as lemmatization. Then extracting the technical keywords from the pre-processed text using co-occurrence statistical information. After extracting the keywords, a fine-tuned BERT model is utilized for expanding the query to improve document retrieval system. The hyper parameters present in the BERT was tuned using frilled lizard optimization to enhance the performance of the BERT model. Proposed model provides 92% accuracy, 95% precision, and 95.6% recall. Thus, a fine-tuned BERT model minimizing query-document mismatch and thereby improving retrieval performance.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system
    Sharma D.K.
    Pamula R.
    Chauhan D.S.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (01) : 829 - 848
  • [2] Comprehensive Information Retrieval Using Fine-Tuned Bert Model and Topic-Assisted Query Expansion
    Patro, Wilson
    Niaz, Aaquib
    Prasath, Rajendra
    AMBIENT INTELLIGENCE IN HEALTH CARE, ICAIHC 2022, 2023, 317 : 117 - 132
  • [3] Paragraph Similarity Scoring and Fine-Tuned BERT for Legal Information Retrieval and Entailment
    Westermann, Hannes
    Savelka, Jaromir
    Benyekhlef, Karim
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2020, 2021, 12758 : 269 - 285
  • [4] Improving MEDLINE document retrieval using automatic query expansion
    Yoo, Sooyoung
    Choi, Jinwook
    ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS, 2007, 4822 : 241 - 249
  • [5] Enhanced Web document retrieval using automatic query expansion
    Khan, MS
    Khor, S
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (01): : 29 - 40
  • [6] A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification
    Qasim, Rukhma
    Bangyal, Waqas Haider
    Alqarni, Mohammed A.
    Almazroi, Abdulwahab Ali
    JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [7] Soft Computing Techniques Based Automatic Query Expansion Approach for Improving Document Retrieval
    Sharma, Dilip Kumar
    Pamula, Rajendra
    Chauhan, D. S.
    PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 972 - 976
  • [8] A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification
    Qasim, Rukhma
    Bangyal, Waqas Haider
    Alqarni, Mohammed A. A.
    Almazroi, Abdulwahab Ali
    JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [9] Heterogeneous data-based information retrieval using a fine-tuned pre-trained BERT language model
    Shaik, Amjan
    Saxena, Surabhi
    Gupta, Manisha
    Parveen, Nikhat
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59537 - 59559
  • [10] An information retrieval system based on automatic query expansion and Hopfield network
    Sheng, XW
    Jiang, MH
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS & SIGNAL PROCESSING, PROCEEDINGS, VOLS 1 AND 2, 2003, : 1624 - 1627