Performance Improvement of Semantic Search Using Sentence Embeddings by Dimensionality Reduction

被引:0
|
作者
Tsumuraya, Kenshin [1 ]
Uehara, Minoru [1 ]
Adachi, Yoshihiro [2 ]
机构
[1] Toyo Univ, Grad Sch Informat Sci & Arts, Kawagoe, Saitama, Japan
[2] Toyo Univ, RIIT, Kawagoe, Saitama, Japan
关键词
D O I
10.1007/978-3-031-57870-0_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic search, which searches for sentences with a high similarity in meaning to that of queries, allows a user to search for the desired sentences even when they cannot think of the appropriate keywords for a lexical search. Moreover, the search function can appropriately handle synonyms and spelling variations. We previously reported a semantic search method for Japanese sentences using sentence embeddings that appropriately processed queries in which sentences were combined using the logical operators AND, OR, and NOT. Reducing the dimensionality of sentence embeddings is expected to make semantic search more robust to noise in the embeddings, resulting in improved search accuracy and faster semantic search computation. In this study, we experimentally verified the improvement in semantic search performance by reducing the dimensionality of sentence embeddings generated by Japanese SimCSE. We also evaluated the runtimes for generating sentence embeddings and reducing dimensionality with PCA.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 50 条
  • [41] Dimensionality reduction in multiobjective shortest path search
    Pulido, Francisco-Javier
    Mandow, Lawrence
    Perez-de-la-Cruz, Jose-Luis
    COMPUTERS & OPERATIONS RESEARCH, 2015, 64 : 60 - 70
  • [42] DRESS: dimensionality reduction for efficient sequence search
    Alexios Kotsifakos
    Alexandra Stefan
    Vassilis Athitsos
    Gautam Das
    Panagiotis Papapetrou
    Data Mining and Knowledge Discovery, 2015, 29 : 1280 - 1311
  • [43] ZADU: A Python']Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings
    Jeon, Hyeon
    Cho, Aeri
    Jang, Jinhwa
    Lee, Soohyun
    Hyun, Jake
    Ko, Hyung-Kwon
    Jo, Jaemin
    Seo, Jinwook
    2023 IEEE VISUALIZATION AND VISUAL ANALYTICS, VIS, 2023, : 196 - 200
  • [44] Single document summarization using word and sentence embeddings
    Ayana
    PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY CONFERENCE (JIMET 2015), 2015, 10 : 523 - 526
  • [45] Topic Analysis of Japanese Sentences Using Sentence Embeddings
    Tsumuraya, Kenshin
    Yonghui, Huang
    Uehara, Minoru
    Adachi, Yoshihiro
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 3, AINA 2024, 2024, 201 : 108 - 122
  • [46] Research of Hierarchy Calculation Based Semantic Dimensionality Reduction
    Zhang, Q.
    Guo, X.
    Lv, D. D.
    Yuan, S. H.
    Zhang, Y. Q.
    Pan, T.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY AND MANAGEMENT SCIENCE (ITMS 2015), 2015, 34 : 897 - 900
  • [47] Detecting Paraphrases for Portuguese using Word and Sentence Embeddings
    Souza, Marlo
    Sanches, Leandro M. P.
    LINGUAMATICA, 2018, 10 (02): : 31 - 44
  • [48] IMPROVEMENT OF SOME MULTIDIMENSIONAL ESTIMATES BY REDUCTION OF DIMENSIONALITY
    FERRE, L
    JOURNAL OF MULTIVARIATE ANALYSIS, 1995, 54 (01) : 147 - 162
  • [49] SePass: Semantic Password Guessing Using k-nn Similarity Search in Word Embeddings
    Huenemoerder, Maximilian
    Schaefer, Levin
    Schueler, Nadine-Sarah
    Eichberg, Michael
    Kroeger, Peer
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 28 - 42
  • [50] Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders
    Alkhatib, Wael
    Rensing, Christoph
    Silberbauer, Johannes
    LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017, 2017, 10318 : 380 - 394