Topic-Based Clustering of Japanese Sentences Using Sentence-BERT

被引:4
|
作者
Tsumuraya, Kenshin [1 ]
Amano, Miki [2 ]
Uehara, Minoru [1 ]
Adachi, Yoshihiro [3 ]
机构
[1] Toyo Univ, Grad Sch Informat Sci & Arts, Kawagoe, Saitama, Japan
[2] Toyo Univ, Dept Informat Sci & Arts, Kawagoe, Saitama, Japan
[3] Toyo Univ, RIIT, Kawagoe, Saitama, Japan
关键词
Japanese Text Analysis; Topic-Based Clustering; Sentence-BERT; Cluster Labeling;
D O I
10.1109/CANDARW57323.2022.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, text analysis and generation techniques using machine learning have substantially developed and are used to analyze opinions on social networking services and reviews on the Web. However, to build a supervised learning model with BERT to classify sentences based on topics, it is necessary to select topic classes in the field of application and create teacher-labeled datasets corresponding to each class. In this study, we developed a method for clustering Japanese sentences based on topics using a distributed representation obtained using Japanese Sentence-BERT (JSBERT) fine-tuned by the Japanese translation of the Stanford Natural Language Inference corpus. In particular, the use of a distributed representation generated by JSBERT only from the nouns that make up a sentence is an effective way to cluster Japanese sentence datasets based on topics using cosine similarity. We also devised a method to assign an appropriate cluster label to each cluster to make it easier to understand the contents of the cluster. Furthermore, we devised a function to explain why each sentence was classified into the corresponding cluster.
引用
收藏
页码:255 / 260
页数:6
相关论文
共 50 条
  • [1] CIBS: A biomedical text summarizer using topic-based sentence clustering
    Moradi, Milad
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 88 : 53 - 61
  • [2] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
    Reimers, Nils
    Gurevych, Iryna
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3982 - 3992
  • [3] Topic Analysis of Japanese Sentences Using Sentence Embeddings
    Tsumuraya, Kenshin
    Yonghui, Huang
    Uehara, Minoru
    Adachi, Yoshihiro
    [J]. ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 3, AINA 2024, 2024, 201 : 108 - 122
  • [4] Fusion of Sentence-BERT and Machine Learning for Comment Text Topic Identification
    Wang, Yuhan
    Tong, Bangguo
    [J]. 2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 21 - 25
  • [5] Biomedical knowledge discovery based on Sentence-BERT
    Shen, Si
    Liu, Xiao
    Sun, Hao
    Wang, Dongbbo
    [J]. Proceedings of the Association for Information Science and Technology, 2020, 57 (01)
  • [6] Automatic Grading System Using Sentence-BERT Network
    Ndukwe, Ifeanyi G.
    Amadi, Chukwudi E.
    Nkomo, Larian M.
    Daniel, Ben K.
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 224 - 227
  • [7] Research on Log Anomaly Detection Based on Sentence-BERT
    Hu, Caiping
    Sun, Xuekui
    Dai, Hua
    Zhang, Hangchuan
    Liu, Haiqiang
    [J]. ELECTRONICS, 2023, 12 (17)
  • [8] Feature analysis of sentence vectors by an image-generation model using Sentence-BERT
    Izumi, Masato
    Jin'no, Kenya
    [J]. IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2023, 14 (02): : 508 - 519
  • [9] Relation-Aware Entity Matching Using Sentence-BERT
    Zhou, Huchen
    Huang, Wenfeng
    Li, Mohan
    Lai, Yulin
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1581 - 1595
  • [10] Sentence retrieval with a topic-based language model
    National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China
    [J]. Jisuanji Yanjiu yu Fazhan, 2007, 2 (288-295):