Study for Automatic Classification of Arabic Spoken Documents

被引:0
|
作者
Labidi, Mohamed [1 ]
Maraoui, Mohsen [2 ]
Zrigui, Mounir [1 ]
机构
[1] Res Lab Technol Informat & Commun & Elect Engn, Tunis, Tunisia
[2] Computat Math Lab, Monastir, Tunisia
关键词
Arabic speech classification; Natural language processing; Artificial intelligence; N-gram; Stem; Root; KNN; SVM; Naive Bayes;
D O I
10.1007/978-3-319-67077-5_44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the important tasks in natural language processing is speech classification by domain. As shown in the literature, no prior studies have addressed this problem, specially the effect of using root N-grams and stem N-grams on Arabic speech classification performance. In this paper we describe a study for Arabic spoken documents classification, using the K-Nearest Neighbor, the Naive Bayes and the Support Vector Machine. We create a speech recognition system for the transcription of Arabic audio files. Then, we use four types of features: 1-gram, 2-gram and 3-gram word roots or stems as well as full words. The obtained results show that, compared to stem or word N-grams, the use of a 1-gram root as a feature provides greater classification performance for Arabic speech classification. It is that classification performance decreases whenever the number of N-grams increases. The data also exhibit that the support vector machine outperforms the Naive Bayes and the k-nearest neighbor with 1 gram. Whenever the k-nearest neighbor is used, the 2-gram root achieves the best performance. The 3-gram root, on the other hand, achieves the best performance whenever the support vector machine was used.
引用
收藏
页码:459 / 468
页数:10
相关论文
共 50 条
  • [1] An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents
    Raheel, Saeed
    Dichy, Joseph
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 673 - 686
  • [2] Comparative study of two approaches to automatic summarization of arabic documents
    [J]. Alwesabi, Khaled (K_alwesabi@csu.edu.cn), 1600, Science and Engineering Research Support Society (09):
  • [3] Automatic Summarization of the Arabic Documents using NMF: A Preliminary Study
    Mohamed, A. A.
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 235 - 240
  • [4] Arabic/English automatic spoken language identification
    Nofal, Maged
    Abdel-Reheem, Esam
    El Henawy, Hadia
    [J]. IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing - Proceedings, 1999, : 400 - 403
  • [5] Automatic documents classification
    Mohamed, Hoda K.
    [J]. 2007 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS: ICCES '07, 2007, : 33 - 37
  • [6] Classification of personal Arabic handwritten documents
    Brook, Salama
    Al Aghbari, Zaher
    [J]. WSEAS Transactions on Information Science and Applications, 2008, 5 (06): : 1021 - 1030
  • [7] Automatic Spoken Customer Query Identification for Arabic Language
    Qaroush, Aziz M.
    Hanani, A.
    Jaber, Bassam
    Karmi, Mohammed
    Qamhiyeh, Bashar
    [J]. PROCEEDINGS OF THE 2016 8TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING (ICIME 2016), 2016, : 41 - 46
  • [8] AUTOMATIC SEGMENTATION FOR ARABIC CHARACTERS IN HANDWRITING DOCUMENTS
    Lawgali, A.
    Bouridane, A.
    Angelova, M.
    Ghassemlooy, Z.
    [J]. 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [9] Automatic spoken affect classification and analysis
    Roy, D
    Pentland, A
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, 1996, : 363 - 367
  • [10] Automatic processing of Historical Arabic Documents: A comprehensive Survey
    Ibn Khedher, Mohamed
    Jmila, Houda
    El-Yacoubi, Mounim A.
    [J]. PATTERN RECOGNITION, 2020, 100