Performance Improvement of Semantic Search Using Sentence Embeddings by Dimensionality Reduction

被引:0
|
作者
Tsumuraya, Kenshin [1 ]
Uehara, Minoru [1 ]
Adachi, Yoshihiro [2 ]
机构
[1] Toyo Univ, Grad Sch Informat Sci & Arts, Kawagoe, Saitama, Japan
[2] Toyo Univ, RIIT, Kawagoe, Saitama, Japan
关键词
D O I
10.1007/978-3-031-57870-0_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic search, which searches for sentences with a high similarity in meaning to that of queries, allows a user to search for the desired sentences even when they cannot think of the appropriate keywords for a lexical search. Moreover, the search function can appropriately handle synonyms and spelling variations. We previously reported a semantic search method for Japanese sentences using sentence embeddings that appropriately processed queries in which sentences were combined using the logical operators AND, OR, and NOT. Reducing the dimensionality of sentence embeddings is expected to make semantic search more robust to noise in the embeddings, resulting in improved search accuracy and faster semantic search computation. In this study, we experimentally verified the improvement in semantic search performance by reducing the dimensionality of sentence embeddings generated by Japanese SimCSE. We also evaluated the runtimes for generating sentence embeddings and reducing dimensionality with PCA.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 50 条
  • [21] Dimensionality Reduction with Evolutionary Shephard-Kruskal Embeddings
    Kramer, Oliver
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2018), 2018, : 478 - 481
  • [22] Dimensionality Reduction by Supervised Neighbor Embedding Using Laplacian Search
    Zheng, Jianwei
    Zhang, Hangke
    Cattani, Carlo
    Wang, Wanliang
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2014, 2014
  • [23] Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings
    Yin, Wei
    Liu, Yifan
    Shen, Chunhua
    Sun, Baichuan
    van den Hengel, Anton
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4036 - 4051
  • [24] A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
    Tan, Haochen
    Shao, Wei
    Wu, Han
    Yang, Ke
    Song, Linqi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 246 - 256
  • [25] Dimensionality Reduction on the Cartesian Product of Embeddings of Multiple Dissimilarity Matrices
    Zhiliang Ma
    Adam Cardinal-Stakenas
    Youngser Park
    Michael W. Trosset
    Carey E. Priebe
    Journal of Classification, 2010, 27 : 307 - 321
  • [26] DefSent: Sentence Embeddings using Definition Sentences
    Tsukagoshi, Hayato
    Sasano, Ryohei
    Takeda, Koichi
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 411 - 418
  • [27] Dimensionality reduction by semantic mapping in text categorization
    Corrêa, RF
    Ludermir, TB
    NEURAL INFORMATION PROCESSING, 2004, 3316 : 1032 - 1037
  • [28] Nonlinear supervised dimensionality reduction via smooth regular embeddings
    Ornek, Cem
    Vural, Elif
    PATTERN RECOGNITION, 2019, 87 : 55 - 66
  • [29] Dimensionality Reduction on the Cartesian Product of Embeddings of Multiple Dissimilarity Matrices
    Ma, Zhiliang
    Cardinal-Stakenas, Adam
    Park, Youngser
    Trosset, Michael W.
    Priebe, Carey E.
    JOURNAL OF CLASSIFICATION, 2010, 27 (03) : 307 - 321
  • [30] Ultrafast Localization of the Optic Disc Using Dimensionality Reduction of the Search Space
    Mahfouz, Ahmed Essam
    Fahmy, Ahmed S.
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2009, PT II, PROCEEDINGS, 2009, 5762 : 985 - 992