Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

被引:67
|
作者
Al-Smadi, Mohammad [1 ]
Jaradat, Zain [1 ]
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Sci, POB 3030, Irbid 22110, Jordan
关键词
Paraphrase identification; Semantic text similarity; Semantic analysis; Arabic language; Natural language processing;
D O I
10.1016/j.ipm.2017.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results. (c) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:640 / 652
页数:13
相关论文
共 50 条
  • [31] Sentiment Analysis of Tweets Using Semantic Analysis
    Kale, Snehal
    Padmadas, Vijaya
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [32] FANE: A FAke NEws Detector Based on Syntactic, Semantic, and Social Features Bayesian Analysis
    Arya, Varsha
    Attar, Razaz Waheeb
    Alhomoud, Ahmed
    Casillo, Mario
    Colace, Francesco
    Conte, Dajana
    Lombardi, Marco
    Santaniello, Domenico
    Valentino, Carmine
    International Journal on Semantic Web and Information Systems, 2024, 20 (01)
  • [33] Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis
    Dias, Gael
    Moraliyski, Rumen
    Cordeiro, Joao
    Doucet, Antoine
    Ahonen-Myka, Helena
    NATURAL LANGUAGE ENGINEERING, 2010, 16 : 439 - 467
  • [34] Short question-answers assessment using lexical and semantic similarity based features
    Ahmad, Tameem
    Ahamad, Maksud
    Ahmed, Sayyed Usman
    Ahmad, Nesar
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2022, 25 (07): : 2057 - 2067
  • [35] Text-to-Concept: A Semantic Indexing Framework for Arabic News Videos
    Mansouri, Sadek
    Lhioui, Chahira
    Charhad, Mbarek
    Zrigui, Mounir
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 575 - 584
  • [36] Unsupervised tweets categorization using semantic and statistical features
    Devi, Maibam Debina
    Saharia, Navanath
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (06) : 9047 - 9064
  • [37] Assessing text semantic similarity using ontology
    Liu, Hongzhe
    Wang, Pengfei
    1600, Academy Publisher (09): : 490 - 497
  • [38] Unsupervised tweets categorization using semantic and statistical features
    Maibam Debina Devi
    Navanath Saharia
    Multimedia Tools and Applications, 2023, 82 : 9047 - 9064
  • [39] Automatic selection of heterogeneous syntactic features in semantic similarity of Polish nouns
    Piasecki, Maciej
    Szpakowicz, Stanislaw
    Broda, Bartosz
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 99 - +
  • [40] Identification of Plagiarism Using Syntactic and Semantic Filters
    Ram, R. Vijay Sundar
    Stamatatos, Efstathios
    Devi, Sobha Lalitha
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 495 - 506