Fine-Tuned BERT Algorithm-Based Automatic Query Expansion for Enhancing Document Retrieval System

被引:0
|
作者
Vishwakarma, Deepak [1 ,3 ]
Kumar, Suresh [2 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Delhi 110078, India
[2] Netaji Subhas Univ Technol NSUT, Dept Comp Sci & Engn, East Campus, New Delhi 110031, India
[3] KIET Grp Inst, Dept Informat Technol, Ghaziabad 201206, India
关键词
A fine-tuned BERT; Automatic query expansion; Embedding augmentation (EA); Co-occurrence statistical information; Frilled lizard optimization; Tokenization; Normalization; Splitting;
D O I
10.1007/s12559-024-10354-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online retrieval systems are mostly web-based, which makes document collecting more dynamic or fluid than in traditional information retrieval systems. With the web growing in size every day, finding meaningful information on it using a search query consisting of only a few keywords which has become increasingly difficult. One important factor in making Internet searches better is query expansion, or QE. Manual query expansion method involves the user adding terms to the query, which takes a long time but produces good results. However, the automatic query expansion (AQE) method determines the best statements with minimal time consumption. Therefore, to improve document retrieval system, a fine-tuned BERT algorithm is developed for automatic query expansion. Initially, the input text was augmented using embedding augmentation (EA) approach. The augmented text was pre-processed using tokenization, normalization, splitting, stemming, stop word removal, as well as lemmatization. Then extracting the technical keywords from the pre-processed text using co-occurrence statistical information. After extracting the keywords, a fine-tuned BERT model is utilized for expanding the query to improve document retrieval system. The hyper parameters present in the BERT was tuned using frilled lizard optimization to enhance the performance of the BERT model. Proposed model provides 92% accuracy, 95% precision, and 95.6% recall. Thus, a fine-tuned BERT model minimizing query-document mismatch and thereby improving retrieval performance.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Enhancing Prostate Cancer Classification by Leveraging Key Radiomics Features and Using the Fine-Tuned Linear SVM Algorithm
    Varan, Metin
    Azimjonov, Jahongir
    Macal, Bilgen
    IEEE ACCESS, 2023, 11 : 88025 - 88039
  • [42] Ontology Based Automatic Query Expansion for Semantic Information Retrieval in Sports Domain
    Chauhan, Rashmi
    Goudar, Rayan
    Rathore, Rohit
    Singh, Priyamvada
    Rao, Sreenivasa
    ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS, 2012, 305 : 422 - +
  • [43] Document/query expansion based on selecting significant concepts for context based retrieval of medical images
    Torjmen-Khemakhem, Mouna
    Gasmi, Karim
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 95
  • [44] Query expansion for document retrieval based on fuzzy rules and user relevance feedback techniques
    Lin, Hsi-Ching
    Wang, Li-Hui
    Chen, Shyi-Ming
    EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) : 397 - 405
  • [45] Collective Evolutionary Concept Distance Based Query Expansion for Effective Web Document Retrieval
    Leung, Clement H. C.
    Li, Yuanxi
    Milani, Alfredo
    Franzoni, Valentina
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT IV, 2013, 7974 : 657 - 672
  • [46] Droplet-based synthetic cells provide fine-tuned biophysical cues for T cell expansion
    Burgstaller, Anna
    Piernitzki, Nils
    Koch, Marcus
    Eichler, Hermann
    Schwarz, Eva
    Dustin, Michael
    Lautenschlaeger, Franziska
    Staufer, Oskar
    EUROPEAN BIOPHYSICS JOURNAL WITH BIOPHYSICS LETTERS, 2023, 52 (SUPPL 1): : S120 - S120
  • [47] Enhancing Mechanical Behavior Assessment in Porous Thermal Barrier Coatings using a Machine Learning Fine-Tuned with Genetic Algorithm
    Alkurdi, Ahmed A. H.
    Al-Mohair, Hani K.
    Rodrigues, Paul
    Alazzawi, Marwa
    Sharma, M. K.
    Oudah, Atheer Y.
    JOURNAL OF THERMAL SPRAY TECHNOLOGY, 2024, 33 (04) : 824 - 838
  • [48] Enhancing Mechanical Behavior Assessment in Porous Thermal Barrier Coatings using a Machine Learning Fine-Tuned with Genetic Algorithm
    Ahmed A. H. Alkurdi
    Hani K. Al-Mohair
    Paul Rodrigues
    Marwa Alazzawi
    M. K. Sharma
    Atheer Y. Oudah
    Journal of Thermal Spray Technology, 2024, 33 : 824 - 838
  • [49] An improved VSM based information retrieval system and fuzzy query expansion
    Wu, JN
    Tanioka, H
    Wang, SZ
    Pan, DH
    Yamamoto, K
    Wang, ZT
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 537 - 546
  • [50] Fine-tuned convolutional neural networks for feature extraction and classification of scanned document images using semi-automatic labelling approach
    Kumar, Krishna
    Mudiraj, Nakkala Srinivas
    Mittal, Meenakshi
    Singh, Satwinder
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2024, 12 (01) : 103 - 134