Paraphrase Identification in Vietnamese Documents

被引:3
|
作者
Bach, Ngo Xuan [1 ,2 ]
Oanh, Tran Thi [3 ]
Hai, Nguyen Trung [1 ]
Phuong, Tu Minh [1 ,2 ]
机构
[1] Posts & Telecommun Inst Technol, Dept Comp Sci, Ho Chi Minh City, Vietnam
[2] Posts & Telecommun Inst Technol, Machine Learning & Applicat Lab, Ho Chi Minh City, Vietnam
[3] Vietnam Natl Univ, Int Sch, Hanoi, Vietnam
关键词
Paraphrase Identification; Semantic Similarity; Support Vector Machines; Maximum Entropy Model; Naive Bayes Classification; K-Nearest Neighbor;
D O I
10.1109/KSE.2015.37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese.
引用
收藏
页码:174 / 179
页数:6
相关论文
共 50 条
  • [21] A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models
    Pronoza, Ekaterina
    Yagunova, Elena
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 573 - 587
  • [22] Robustness to Modification with Shared Words in Paraphrase Identification
    Shi, Zhouxing
    Huang, Minlie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 164 - 171
  • [23] Contribution of Syntactic and Semantic Attributes in Paraphrase Identification
    Karaoglan, Bahar
    Kisla, Tarik
    Kumova Metin, Senem
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [24] Paraphrase identification using collaborative adversarial networks
    Alzubi, Jafar A.
    Jain, Rachna
    Kathuria, Abhishek
    Khandelwal, Anjali
    Saxena, Anmol
    Singh, Anubhav
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (01) : 1021 - 1032
  • [25] Effectively Using Monotonicity Analysis for Paraphrase Identification
    Uribe, Diego
    2009 EIGHTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 108 - 113
  • [26] Paraphrase identification based on hierarchical neural network
    Yuan L.
    Gao S.
    Guo M.
    Yuan Z.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2020, 52 (10): : 175 - 182
  • [27] Semantic and Heuristic Based Approach for Paraphrase Identification
    Mohamed, Muhidin A.
    Oussalah, Mourad
    2018 14TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2018, : 203 - 210
  • [28] Paraphrase identification using semantic heuristic features
    Ul-Qayyum, Zia
    Altaf, Wasif
    Research Journal of Applied Sciences, Engineering and Technology, 2012, 4 (22) : 4894 - 4904
  • [29] RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification
    Martynov, Nikita
    Krotova, Irina
    Logacheva, Varvara
    Panchenko, Alexander
    Kozlova, Olga
    Semenov, Nikita
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5683 - 5691
  • [30] Paraphrase Generation and Identification at Paragraph-Level
    Al Saqaabi, Arwa
    Stewart, Craig
    Akrida, Eleni
    Cristea, Alexandra I.
    GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 278 - 291