The Impact of Sentence Embeddings in Turkish Paraphrase Detection

被引:0
|
作者
Karaoglan, Bahar [1 ]
Yorgancioglu, Hakki Engin [1 ]
Kisla, Tarik [2 ]
Kumova Metin, Senem [3 ]
机构
[1] Ege Univ, Uluslararasi Bilgisayar Enstittusu, Izmir, Turkey
[2] Ege Univ, Bilgisayar & Ogret Teknol Egitimi Bolumu, Izmir, Turkey
[3] Izmir Econ Univ, Yazilim Muhendisligi Bolumu, Izmir, Turkey
关键词
paraphrasing; praphrase corpus; Word embedding; sentence embedding;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as "using average vector for word embeddings" (AWE), "concatenated vectors for word embeddings" (CWE) and "word mover's distance word embeddings" (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Urdu Short Paraphrase Detection at Sentence Level
    Hafeez, Hamza
    Muneer, Iqra
    Sharjeel, Muhammad
    Ashraf, Muhammad Adnan
    Nawab, Rao Muhammad Adeel
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [2] Sentence boundary detection in Turkish
    Dinçer, BT
    Karaoglan, B
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2004, 3261 : 255 - 262
  • [3] Turkish Paraphrase Corpus
    Demir, Seniz
    El-Kahlout, Ilknur Durgar
    Unal, Erdem
    Kaya, Hamza
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 4087 - 4091
  • [4] Paraphrase type identification for plagiarism detection using contexts and word embeddings
    Alvi, Faisal
    Stevenson, Mark
    Clough, Paul
    INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION, 2021, 18 (01)
  • [5] Paraphrase type identification for plagiarism detection using contexts and word embeddings
    Faisal Alvi
    Mark Stevenson
    Paul Clough
    International Journal of Educational Technology in Higher Education, 18
  • [6] Attribute Value-Range Detection in Identification of Paraphrase Sentence Pairs
    Kumova, Senem
    Karaoglan, Bahar
    Kisla, Tarik
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1393 - 1396
  • [7] Task specific sentence embeddings for ASR error detection
    Ghannay, Sahar
    Esteve, Yannick
    Camelin, Nathalie
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1288 - 1292
  • [8] CUED REPRODUCTION AND PARAPHRASE OF A SIMPLE SENTENCE
    ITOH, Y
    KOYAZU, T
    JAPANESE JOURNAL OF PSYCHOLOGY, 1981, 52 (03): : 159 - 165
  • [9] PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese
    Zhang, Bowei
    Sun, Weiwei
    Wan, Xiaojun
    Guo, Zongming
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 814 - 826
  • [10] Monolingual Paraphrase Detection Corpus for Low Resource Pashto Language at Sentence Level
    Ali, Iqra
    Kamigaito, Hidetaka
    Watanabe, Taro
    2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, : 11574 - 11581