Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model

被引:0
|
作者
Tezgider, Murat [1 ]
Yildiz, Beytullah [2 ]
Aydin, Galip [3 ]
机构
[1] Hacettepe Univ, Ankara, Turkey
[2] TC Cumhurbaskanligi, Bilgi Teknol Baskanligi, Ankara, Turkey
[3] Firat Univ, Bilgisayar Muhendisligi Bolumu, Elazig, Turkey
关键词
Deep learning; text processing; text analysis; word representation; Word2Vec;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has become one of the most popular machine learning methods. The success in the text processing, analysis and classification has been significantly enhanced by using deep learning. This success is contributed by the quality of the word representations. TFIDF, FastText, Glove and Word2Vec are used for the word representation. In this work, we aimed to improve word representations by tuning Word2Vec parameters. The success of the word representations was measured by using a deep learning classification model. The minimum word count, vector size and window size parameters of Word2Vec were used for the measurement. 2,8 million Turkish texts consisting of 243 million words to create word embedding (word representations) and around 263 thousand documents consisting of 15 different classes for classification were used. We observed that correctly selected parameters increased the word representation quality and thus the accuracy of classification.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model
    Tang Huanling
    Zhu Hui
    Wei Hongmin
    Zheng Han
    Mao Xueli
    Lu Mingyu
    Guo Jin
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 647 - 654
  • [2] Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model
    TANG Huanling
    ZHU Hui
    WEI Hongmin
    ZHENG Han
    MAO Xueli
    LU Mingyu
    GUO Jin
    [J]. Chinese Journal of Electronics, 2023, 32 (03) : 647 - 654
  • [3] Word2vec's Distributed Word Representation for Hindi Word Sense Disambiguation
    Kumari, Archana
    Lobiyal, D. K.
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY (ICDCIT 2020), 2020, 11969 : 325 - 335
  • [4] Considerations about learning Word2Vec
    Giovanni Di Gennaro
    Amedeo Buonanno
    Francesco A. N. Palmieri
    [J]. The Journal of Supercomputing, 2021, 77 : 12320 - 12335
  • [5] Considerations about learning Word2Vec
    Di Gennaro, Giovanni
    Buonanno, Amedeo
    Palmieri, Francesco A. N.
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 12320 - 12335
  • [6] PTPD: predicting therapeutic peptides by deep learning and word2vec
    Wu, Chuanyan
    Gao, Rui
    Zhang, Yusen
    De Marinis, Yang
    [J]. BMC BIOINFORMATICS, 2019, 20 (01)
  • [7] The new deep learning architecture based on GRU and word2vec
    Atassi, Abdelhamid
    El Azami, Ikram
    Sadiq, Abdelalim
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, CONTROL, OPTIMIZATION AND COMPUTER SCIENCE (ICECOCS), 2018,
  • [8] Classification Turkish SMS with Deep Learning Tool Word2Vec
    Karasoy, Onur
    Balli, Serkan
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 294 - 297
  • [9] PTPD: predicting therapeutic peptides by deep learning and word2vec
    Chuanyan Wu
    Rui Gao
    Yusen Zhang
    Yang De Marinis
    [J]. BMC Bioinformatics, 20
  • [10] Using Part of Speech Tagging for Improving Word2vec Model
    Suleiman, Dima
    Awajan, Arafat A.
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 213 - 219