Arabic Text Classification Based on Word and Document Embeddings

被引:10
|
作者
El Mahdaouy, Abdelkader [1 ,2 ]
Gaussier, Eric [1 ]
El Alaoui, Said Ouatik [2 ]
机构
[1] Grenoble Alpes Univ, CNRS, LIG, AMA, Grenoble, France
[2] USMBA, FSDM, LIM, Dept Comp Sci, Fes, Morocco
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016 | 2017年 / 533卷
关键词
Arabic text classification; Arabic natural language processing; Document embeddings; Word embeddings; SKIP-Gram; Continuous Bag-of-Word; Glove; Doc2vec;
D O I
10.1007/978-3-319-48308-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
引用
收藏
页码:32 / 41
页数:10
相关论文
共 50 条
  • [21] Using Word Embeddings with Linear Models for Short Text Classification
    Krzywicki, Alfred
    Heap, Bradford
    Bain, Michael
    Wobcke, Wayne
    Schmeidl, Susanne
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 819 - 827
  • [22] Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings
    Haj-Yahia, Zied
    Sieg, Adrien
    Deleris, Lea A.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 371 - 379
  • [23] Multi-class Document Classification Using Improved Word Embeddings
    Rabut, Benedict A.
    Fajardo, Arnel C.
    Medina, Ruji P.
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING AND BIG DATA (ICCBD 2019), 2019, : 42 - 46
  • [24] Extending Full Text Search for Legal Document Collections Using Word Embeddings
    Landthaler, Joerg
    Waltl, Bernhard
    Holl, Patrick
    Matthes, Florian
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 294 : 73 - 82
  • [25] Bengali Word Embeddings and It's Application in Solving Document Classification Problem
    Ahmad, Adnan
    Amin, Mohammad Ruhul
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 425 - 430
  • [26] Word Embeddings for Arabic Sentiment Analysis
    Altowayan, A. Aziz
    Tao, Lixin
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3820 - 3825
  • [27] The Impact of Arabic Diacritization on Word Embeddings
    Abbache, Mohamed
    Abbache, Ahmed
    Xu, Jingwen
    Meziane, Farid
    Wen, Xianbin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [28] Methodical Evaluation of Arabic Word Embeddings
    Elrazzaz, Mohammed
    Elbassuoni, Shady
    Shaban, Khaled
    Helwe, Chadi
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 454 - 458
  • [29] Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
    Elhassan, Nasrin
    Varone, Giuseppe
    Ahmed, Rami
    Gogate, Mandar
    Dashtipour, Kia
    Almoamari, Hani
    El-Affendi, Mohammed A.
    Al-Tamimi, Bassam Naji
    Albalwy, Faisal
    Hussain, Amir
    COMPUTERS, 2023, 12 (06)
  • [30] Text Similarity Function Based on Word Embeddings for Short Text Analysis
    Pascual, Adrian Jimenez
    Fujita, Sumio
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 391 - 402