A Semantic Text Expansion for Paraphrasing Identification in Arabic Microblog Posts

被引:0
|
作者
Al-Shboul, Bashar [1 ]
Al-Darras, Duha [1 ]
Al-Qudah, Dana [1 ]
机构
[1] Univ Jordan Amman, Amman, Jordan
关键词
Arabic Paraphrase Identification; Arabic Semantic Text Expansion; Arabic Topic Modelling; SIMILARITY;
D O I
10.1145/3508397.3564848
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An enormous number of microblogs are being created and posted on the web each day. Many of these microblogs are repetitive in terms of content and similar in terms of topic. Being able to detect repetitive content can support various applications such as question answering and trendy topic detection. In this research, we aim to propose a model to detect paraphrasing among Arabic tweets, in addition to identifying tweets belonging to the same topic. The proposed model is based on Latent Dirichlet Allocation (LDA) topic modeling, as well as, semantic text expansion utilizing external resources i.e. BabelNet and Wikipedia. Tweets from multiple Arabic news agencies were collected, preprocessed, and divided into two groups. The first group was used to build the topic modeling and the other group of tweets was paired and classified based on the topic distributions. The results are promising in terms of precision on tweet pairs with a certain time overlap. The best-reported precision is 80.1% achieved using Wikipedia embedded content on the stemmed text mode with a large number of LDA topics.
引用
收藏
页码:129 / 135
页数:7
相关论文
共 50 条
  • [1] Arabic text semantic-based query expansion
    Yusuf, Nuhu
    Yunus, Mohd Amin Mohd
    Wahid, Norfaradilla
    Mustapha, Aida
    Nawi, Nazri Mohd
    Samsudin, Noor Azah
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2022, 14 (01) : 30 - 40
  • [2] SEMANTIC SENTENCE EMBEDDINGS FOR PARAPHRASING AND TEXT SUMMARIZATION
    Zhang, Chi
    Sah, Shagan
    Thang Nguyen
    Peri, Dheeraj
    Loui, Alexander
    Salvaggio, Carl
    Ptucha, Raymond
    [J]. 2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 705 - 709
  • [3] Paraphrasing Identification Techniques in English and Arabic Texts
    Alian, Marwah
    Awajan, Arafat
    [J]. 2020 11TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2020, : 155 - 160
  • [4] Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts
    Massoudi, Kamran
    Tsagkias, Manos
    de Rijke, Maarten
    Weerkamp, Wouter
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 362 - 367
  • [5] Towards Authorship Attribution in Arabic Short-Microblog Text
    Jambi, Kamal Mansour
    Khan, Imtiaz Hussain
    Siddiqui, Muazzam Ahmed
    Alhaj, Salma Omar
    [J]. IEEE ACCESS, 2021, 9 : 128506 - 128520
  • [6] Arabic Text Semantic Graph Representation
    Al Etaiwi, Wael Mahmoud
    Awajan, Arafat
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 265 - 270
  • [7] Acquiring Semantic Relation Pattern From Large Microblog Text
    Liu, Lei
    Xu, Zhigang
    Cai, Haibo
    Diao, Luhong
    Yan, ShuYing
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 633 - 637
  • [8] Hybrid query expansion model for text and microblog information retrieval
    Zingla, Meriem Amina
    Latiri, Chiraz
    Mulhem, Philippe
    Berrut, Catherine
    Slimani, Yahya
    [J]. INFORMATION RETRIEVAL JOURNAL, 2018, 21 (04): : 337 - 367
  • [9] Hybrid query expansion model for text and microblog information retrieval
    Meriem Amina Zingla
    Chiraz Latiri
    Philippe Mulhem
    Catherine Berrut
    Yahya Slimani
    [J]. Information Retrieval Journal, 2018, 21 : 337 - 367
  • [10] Arabic Text Classification based on Semantic Relations
    Hijazi, Musab
    Zeki, Akram
    Ismail, Amelia
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2022, 17 (02): : 937 - 946