Paraphrase Detection Using Machine Translation and Textual Similarity Algorithms

被引:0
|
作者
Kravchenko, Dmitry [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, Beer Sheva, Israel
关键词
Paraphrase detection; Semantic similarity algorithms; Machine translation; Supervised classification;
D O I
10.1007/978-3-319-71746-3_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
I present experiments on the task of paraphrase detection for Russian text using Machine Translation (MT) into English and applying existing sentence similarity algorithms in English on the translated sentences. But since I use translation engines - my method to detect paraphrases can be applied to any other languages, which translation into English is available on translation engines. Specifically, I consider two tasks: given pair of sentences in Russian - classify them into two (non-paraphrases, paraphrases) or three (non-paraphrases, near-paraphrases, precise-paraphrases) classes. I compare five different well-established sentence similarity methods developed in English and three different Machine Translation engines (Google, Microsoft and Yandex). I perform detailed ablation tests to identify the contribution of each component of the five methods, and identify the best combination of Machine Translation and sentence similarity method, including ensembles, on the Russian Paraphrase data set. My best results on the Russian data set are an Accuracy of 81.4% and F1 score of 78.5% for an ensemble method with the translation using three MT engines (Google, Microsoft and Yandex). This compares favorably with state of the art methods in English on data sets of a similar size which are in the range of Accuracy 80.41% and F1-score of 85.96%. This demonstrates that, with the current level of performance of public MT engines, the simple approach of translating/classifying in English has become a feasible strategy to address the task. I perform detailed error analysis to indicate potential for further improvements.
引用
收藏
页码:277 / 292
页数:16
相关论文
共 50 条
  • [1] Paraphrase Identification by Using Clause-Based Similarity Features and Machine Translation Metrics
    Thenmozhi, D.
    Aravindan, Chandrabose
    COMPUTER JOURNAL, 2016, 59 (09): : 1289 - 1302
  • [2] Boosting paraphrase detection through textual similarity metrics with abductive networks
    El-Alfy, El-Sayed M.
    Abdel-Aal, Radwan E.
    Al-Khatib, Wasfi G.
    Alvi, Faisal
    APPLIED SOFT COMPUTING, 2015, 26 : 444 - 453
  • [3] Analysis of the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity
    Magnolini, Simone
    Ngoc Phuoc An Vo
    Popescu, Octavian
    AI*IA 2016: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2016, 10037 : 450 - 463
  • [4] Paraphrase Lattice for Statistical Machine Translation
    Onishi, Takashi
    Utiyama, Masao
    Sumita, Eiichiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (06) : 1299 - 1305
  • [5] Textual Entailment Using Machine Translation Evaluation Metrics
    Saikh, Tanik
    Naskar, Sudip Kumar
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 317 - 328
  • [6] Machine Translation Evaluation using Textual Entailment for Arabic
    El Marouani, Mohamed
    Boudaa, Tarik
    Enneya, Nourddine
    2020 SEVENTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2020, : 148 - 152
  • [7] Question Similarity Detection in Turkish Using Semantic Textual Similarity Methods
    Yildiz, Eray
    Findik, Yasin
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [8] Paraphrase Identification Using Textual Entailment Recognition
    Seethamol, S.
    Manju, K.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1071 - 1074
  • [9] Code Similarity Detection using AST and Textual Information
    Wen W.
    Xue X.
    Li Y.
    Gu P.
    Xu J.
    International Journal of Performability Engineering, 2019, 15 (10) : 2683 - 2691
  • [10] Relevance of Similarity Measures Usage for Paraphrase Detection
    Vrbanec, Tedo
    Mestrovic, Ana
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 129 - 138