The Impact of Sentence Embeddings in Turkish Paraphrase Detection

被引:0
|
作者
Karaoglan, Bahar [1 ]
Yorgancioglu, Hakki Engin [1 ]
Kisla, Tarik [2 ]
Kumova Metin, Senem [3 ]
机构
[1] Ege Univ, Uluslararasi Bilgisayar Enstittusu, Izmir, Turkey
[2] Ege Univ, Bilgisayar & Ogret Teknol Egitimi Bolumu, Izmir, Turkey
[3] Izmir Econ Univ, Yazilim Muhendisligi Bolumu, Izmir, Turkey
关键词
paraphrasing; praphrase corpus; Word embedding; sentence embedding;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as "using average vector for word embeddings" (AWE), "concatenated vectors for word embeddings" (CWE) and "word mover's distance word embeddings" (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Multilevel Sentence Embeddings for Personality Prediction
    Tirotta, Paolo
    Yuasa, Akira
    Morita, Masashi
    arXiv, 2023,
  • [32] Sequential Sentence Embeddings for Semantic Similarity
    Carta, Antonio
    Bacciu, Davide
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1354 - 1361
  • [33] Exploring Semantic Properties of Sentence Embeddings
    Zhu, Xunjie
    Li, Tingfeng
    de Melo, Gerard
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 632 - 637
  • [34] An Unsupervised Method for Coordinated Sentence Boundary and Proper Noun Detection in Turkish
    Ozbey, Can
    Cerit, Onur Sahil
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [35] The Effect of POS Tag Information on Sentence Boundary Detection in Turkish Texts
    Bektas, Yasin
    Ozel, Selma Ayse
    2018 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2018, : 161 - 165
  • [36] Connecting Supervised and Unsupervised Sentence Embeddings
    Levi, Gil
    REPRESENTATION LEARNING FOR NLP, 2018, : 79 - 83
  • [37] Fusion of sentence embeddings for news retrieval
    Urli, Federico
    Versini, Emiliano
    Snidaro, Lauro
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [38] Sentence Level Paraphrase Recognition Based on Different Characteristics Combination
    Zhang, Maoyuan
    Zhang, Hong
    Wu, Deyu
    Pan, Xiaohang
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 279 - 289
  • [39] Large Scale Intent Detection in Turkish Short Sentences with Contextual Word Embeddings
    Dundar, Enes Burak
    Kilic, Osman Fatih
    Cekic, Tolga
    Manav, Yusufcan
    Deniz, Onur
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 187 - 192
  • [40] Description of Turkish Paraphrase Corpus Structure and Generation Method
    Karaoglan, Bahar
    Kisla, Tarik
    Metin, Senem Kumova
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 208 - 217