Measurement of Turkish Word Semantic Similarity and Text Categorization Application

被引:2
|
作者
Amasyah, M. Fatih [1 ]
Beken, Aytunc [1 ]
机构
[1] Yildiz Tekn Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
D O I
10.1109/SIU.2009.5136317
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In literature, texts to be classified are generally represented in the large dimensional Bag of Words space in which every dimension equals to a word or ngram. In this study, firstly the words are placed in a semantic space. The word's coordinates in semantic spaces needs the similarity of the words according to their meanings. Harris states that two words' semantic similarity is related to the number of documents which the words are both in. We used his hypothesis for Turkish words. Firstly, we obtained word co-occurrence matrix from a web corpus. Then, the numerical coordinates of the words are calculated by using multi dimensional scaling. Texts coordinates are obtained from word coordinates which passes in the texts. In our experiments, Turkish news texts are classified into 5 classes. We get more successful results than the traditional Bag of Words space. Our approach is not for only Turkish words/texts, but also for all other languages.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 50 条
  • [1] Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish
    Tulu, Cagatay Neftali
    [J]. ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2022, 16 (04) : 147 - 156
  • [2] Learning Semantic Similarity for Multi-label Text Categorization
    Li, Li
    Wang, Mengxiang
    Zhang, Longkai
    Wang, Houfeng
    [J]. CHINESE LEXICAL SEMANTICS, 2014, 8922 : 260 - 269
  • [3] A text similarity measurement combining word semantic information with TF-IDF method
    Huang C.-H.
    Yin J.
    Hou F.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2011, 34 (05): : 856 - 864
  • [4] An Application of Latent Semantic Analysis for Text Categorization
    Kou, G.
    Peng, Y.
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2015, 10 (03) : 357 - 369
  • [5] Semantic text similarity using corpus-based word similarity and string similarity
    University of Ottawa
    不详
    [J]. ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
  • [6] Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
    Zhou, Shenghan
    Xu, Xingxing
    Liu, Yinglai
    Chang, Runfeng
    Xiao, Yiyong
    [J]. IEEE ACCESS, 2019, 7 : 107247 - 107258
  • [7] Subjective Bayes Method for Word Semantic Similarity Measurement
    Wang, Junhua
    Zuo, Xianglin
    Zuo, Wanli
    Peng, Tao
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 971 - 977
  • [8] Hyponymy Graph Model for Word Semantic Similarity Measurement
    Wang Junhua
    Zuo Wanli
    Peng Tao
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (01) : 96 - 101
  • [9] Hyponymy Graph Model for Word Semantic Similarity Measurement
    WANG Junhua
    ZUO Wanli
    PENG Tao
    [J]. Chinese Journal of Electronics, 2015, 24 (01) : 96 - 101
  • [10] Semantic Similarity between Turkish and European Languages Using Word Embeddings
    Senel, Lutfi Kerem
    Yucesoy, Veysel
    Koc, Aykut
    Cukur, Tolga
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,