The Impact of Sentence Embeddings in Turkish Paraphrase Detection

被引:0
|
作者
Karaoglan, Bahar [1 ]
Yorgancioglu, Hakki Engin [1 ]
Kisla, Tarik [2 ]
Kumova Metin, Senem [3 ]
机构
[1] Ege Univ, Uluslararasi Bilgisayar Enstittusu, Izmir, Turkey
[2] Ege Univ, Bilgisayar & Ogret Teknol Egitimi Bolumu, Izmir, Turkey
[3] Izmir Econ Univ, Yazilim Muhendisligi Bolumu, Izmir, Turkey
关键词
paraphrasing; praphrase corpus; Word embedding; sentence embedding;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as "using average vector for word embeddings" (AWE), "concatenated vectors for word embeddings" (CWE) and "word mover's distance word embeddings" (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Efficient comparison of sentence embeddings
    Zoupanos, Spyros
    Kolovos, Stratis
    Kanavos, Athanasios
    Papadimitriou, Orestis
    Maragoudakis, Manolis
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [22] Using Word Embeddings in Detection of Temporal Expressions in Turkish Texts
    Emirali, Ensar
    Karsligil, M. Elif
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [23] Pitfalls in the Evaluation of Sentence Embeddings
    Eger, Steffen
    Rueckle, Andreas
    Gurevych, Iryna
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 55 - 60
  • [24] Comparison of Sentence Similarity Measures for Russian Paraphrase Identification
    Pronoza, Ekaterina
    Yagunova, Elena
    2015 ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE AND INFORMATION EXTRACTION, SOCIAL MEDIA AND WEB SEARCH FRUCT CONFERENCE (AINL-ISMW FRUCT), 2015, : 74 - 82
  • [25] Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach
    Lin, Zhe
    Wan, Xiaojun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1548 - 1557
  • [26] Deep learning and sentence embeddings for detection of clickbait news from online content
    Amara Muqadas
    Hikmat Ullah Khan
    Muhammad Ramzan
    Anam Naz
    Tariq Alsahfi
    Ali Daud
    Scientific Reports, 15 (1)
  • [27] Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?
    Sannigrahi, Sonal
    van Genabith, Josef
    Espana-Bonet, Cristina
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2306 - 2316
  • [28] Hybrid Emotion Detection with Word Embeddings in a Low Resourced Language: Turkish
    Metin, Senem Kumova
    Giraz, Hatice Ertugrul
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1449 - 1457
  • [29] Empirical Linguistic Study of Sentence Embeddings
    Krasnowska-Kieras, Katarzyna
    Wroblewska, Alina
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5729 - 5739
  • [30] Text classification by untrained sentence embeddings
    Di Sarli, Daniele
    Gallicchio, Claudio
    Micheli, Alessio
    INTELLIGENZA ARTIFICIALE, 2020, 14 (02) : 245 - 259