Arabic Text Classification Based on Word and Document Embeddings

被引:10
|
作者
El Mahdaouy, Abdelkader [1 ,2 ]
Gaussier, Eric [1 ]
El Alaoui, Said Ouatik [2 ]
机构
[1] Grenoble Alpes Univ, CNRS, LIG, AMA, Grenoble, France
[2] USMBA, FSDM, LIM, Dept Comp Sci, Fes, Morocco
关键词
Arabic text classification; Arabic natural language processing; Document embeddings; Word embeddings; SKIP-Gram; Continuous Bag-of-Word; Glove; Doc2vec;
D O I
10.1007/978-3-319-48308-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
引用
收藏
页码:32 / 41
页数:10
相关论文
共 50 条
  • [1] Text classification with document embeddings
    Huang, Chaochao (chaochaohuang12@fudan.edu.cn), 1600, Springer Verlag (8801):
  • [2] Text Classification with Document Embeddings
    Huang, Chaochao
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 131 - 140
  • [3] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [4] Sequence-Based Word Embeddings for Effective Text Classification
    Gomes, Bruno Guilherme
    Murai, Fabricio
    Goussevskaia, Olga
    Couto da Silva, Ana Paula
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 135 - 146
  • [5] Text classification with semantically enriched word embeddings
    Pittaras, N.
    Giannakopoulos, G.
    Papadakis, G.
    Karkaletsis, V
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (04) : 391 - 425
  • [6] Detecting gender bias in Arabic text through word embeddings
    Mourad, Aya
    Abu Salem, Fatima K.
    Elbassuoni, Shady
    PLOS ONE, 2025, 20 (03):
  • [7] Effect of Word Segmentation on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Al-Subaie, Abdullah
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 127 - 131
  • [8] Hybrid Word/Part-of-Arabic-Word Language Models For Arabic Text Document Recognition
    BenZeghiba, Mohamed Faouzi
    Louradour, Jerome
    Kermorvant, Christopher
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 671 - 675
  • [9] Knowledge-enhanced document embeddings for text classification
    Sinoara, Roberta A.
    Camacho-Collados, Jose
    Rossi, Rafael G.
    Navigli, Roberto
    Rezende, Solange O.
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 955 - 971
  • [10] Word-class embeddings for multiclass text classification
    Moreo, Alejandro
    Esuli, Andrea
    Sebastiani, Fabrizio
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (03) : 911 - 963