Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization

被引：29

作者：

Rashwan, Mohsen A. A. ^{[1
,2
]}

Al Sallab, Ahmad A. ^{[3
,4
]}

Raafat, Hazem M. ^{[5
]}

Rafea, Ahmed ^{[6
]}

机构：

[1] Engn Co Dev Comp Syst RDI, Giza 12613, Egypt

[2] Cairo Univ, Fac Engn, Dept Elect & Elect Commun, Giza 00202, Egypt

[3] Valeo Interbranch Automot Software, Giza, Egypt

[4] Cairo Univ, Fac Engn, Dept Elect & Elect Commun, Giza 12613, Egypt

[5] Kuwait Univ, Dept Comp Sci, Safat 13060, Kuwait

[6] Amer Univ Cairo, Dept Comp Sci, Cairo 11835, Egypt

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 03期

关键词：

Arabic diacritization; classifier design; deep networks; part-of-speech (PoS) tagging;

D O I：

10.1109/TASLP.2015.2395255

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The Arabic language belongs to a group of languages that require diacritization over their characters. Modern Standard Arabic (MSA) transcripts omit the diacritics, which are essential for many machine learning tasks like Text-To-Speech (TTS) systems. In this work Arabic diacritics restoration is tackled under a deep learning framework that includes the Confused Sub-set Resolution (CSR) method to improve the classification accuracy, in addition to an Arabic Part-of-Speech (PoS) tagging framework using deep neural nets. Special focus is given to syntactic diacritization, which still suffers low accuracy as indicated in prior works. Evaluation is done versus state-of-the-art systems reported in literature, with quite challenging datasets collected from different domains. Standard datasets like the LDC Arabic Tree Bank are used in addition to custom ones we have made available online to allow other researchers to replicate these results. Results show significant improvement of the proposed techniques over other approaches, reducing the syntactic classification error to 9.9% and morphological classification error to 3% compared to 12.7% and 3.8% of the best reported results in literature, improving the error by 22% over the best reported systems.

引用

页码：505 / 516

页数：12

共 50 条

[1] Simple Extensible Deep Learning Model for Automatic Arabic Diacritization
Abbad, Hamza
Xiong, Shengwu
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
[2] Effective Deep Learning Models for Automatic Diacritization of Arabic Text
Madhfar, Mokthar Ali Hasan
Qamar, Ali Mustafa
[J]. IEEE ACCESS, 2021, 9 : 273 - 288
[3] Feature sub-set selection metrics for Arabic text classification
Mesleh, Abdelwadood Moh'd
[J]. PATTERN RECOGNITION LETTERS, 2011, 32 (14) : 1922 - 1929
[4] On the Training of Deep Neural Networks for Automatic Arabic-Text Diacritization
Karim, Asma Abdel
Abandah, Gheith
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 276 - 286
[5] A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text
Almanaseer, Waref
Alshraideh, Mohammad
Alkadi, Omar
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (11):
[6] A Novel Deep-learning based Approach for Automatic Diacritization of Arabic Poems using Sequence-to-Sequence Model
Mahmoud, Mohamed S.
Negied, Nermin
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 42 - 46
[7] A Deep Learning Framework for Automatic Detection of Hate Speech Embedded in Arabic Tweets
Duwairi, Rehab
Hayajneh, Amena
Quwaider, Muhannad
[J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (04) : 4001 - 4014
[8] A Deep Learning Framework for Automatic Detection of Hate Speech Embedded in Arabic Tweets
Rehab Duwairi
Amena Hayajneh
Muhannad Quwaider
[J]. Arabian Journal for Science and Engineering, 2021, 46 : 4001 - 4014
[9] Automatic Arabic Dialect Classification Using Deep Learning Models
Lulu, Leena
Elnagar, Ashraf
[J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 262 - 269
[10] Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents
Nabil Alami
Noureddine En-nahnahi
Said Alaoui Ouatik
Mohammed Meknassi
[J]. Arabian Journal for Science and Engineering, 2018, 43 : 7803 - 7815

← 1 2 3 4 5 →