The Saudi Novel Corpus: Design and Compilation

被引:4
|
作者
Alfraidi, Tareq [1 ]
Abdeen, Mohammad A. R. [2 ]
Yatimi, Ahmed [3 ]
Alluhaibi, Reyadh [4 ]
Al-Thubaity, Abdulmohsen [5 ]
机构
[1] Islamic Univ Madinah, Dept Linguist, Madinah 42351, Saudi Arabia
[2] Islamic Univ Madinah, Dept Comp Sci, Madinah 42351, Saudi Arabia
[3] Islamic Univ Madinah, Dept Literature & Rhetor, Madinah 42351, Saudi Arabia
[4] Taibah Univ, Dept Comp Sci, Madinah 41477, Saudi Arabia
[5] King Abdulaziz City Sci & Technol, Riyadh 12354, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 13期
关键词
corpora; corpus linguistics; Arabic; Saudi novels; LINGUISTICS;
D O I
10.3390/app12136648
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Arabic has recently received significant attention from corpus compilers. This situation has led to the creation of many Arabic corpora that cover various genres, most notably the newswire genre. Yet, Arabic novels, and specifically those authored by Saudi writers, lack the sufficient digital datasets that would enhance corpus linguistic and stylistic studies of these works. Thus, Arabic lags behind English and other European languages in this context. In this paper, we present the Saudi Novels Corpus, built to be a valuable resource for linguistic and stylistic research communities. We specifically present the procedures we followed and the decisions we made in creating the corpus. We describe and clarify the design criteria, data collection methods, process of annotation, and encoding. In addition, we present preliminary results that emerged from the analysis of the corpus content. We consider the work described in this paper as initial steps to bridge the existing gap between corpus linguistics and Arabic literary texts. Further work is planned to improve the quality of the corpus by adding advanced features.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Balanced corpus of informal spoken Czech: compilation, design and findings
    Waclawicova, Martina
    Kren, Michal
    Valkova, Lucie
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1807 - 1810
  • [2] Design and compilation of a specialized Spanish-German parallel corpus
    Escartin, Carla Parra
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2199 - 2206
  • [3] Design and compilation of syntactically tagged corpus of Japanese statutory sentences
    Ogawa, Yasuhiro
    Yamada, Masayuki
    Kato, Ryuta
    Toyama, Katsuhiko
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 6797 LNAI : 141 - 152
  • [4] PROTOCOLIZED METHODOLOGY FOR COMPILATION OF A TRAVEL INSURANCE CORPUS: DESIGN AND REPRESENTATIVENESS
    Seghiri, Miriam
    [J]. RLA-REVISTA DE LINGUISTICA TEORICA Y APLICADA, 2011, 49 (02): : 13 - 30
  • [5] ESP corpus design: compilation of the Veterinary Nursing Medical Chart Corpus and the Veterinary Nursing Wordlist
    Ohashi, Yukiko
    Katagiri, Noriaki
    Oka, Katsutoshi
    Hanada, Michiko
    [J]. CORPORA, 2020, 15 (02) : 125 - 140
  • [6] An Incremental Approach to Corpus Design and Construction: Application to a Large Contemporary Saudi Corpus
    Elgibreen, Hebah
    Faisal, Mohammed
    Al Sulaiman, Mansour
    Abdou, Sherif
    Mekhtiche, Mohamed Amine
    Moussa, Abdullah M.
    Alohali, Yousef A.
    Abdul, Wadood
    Muhammad, Ghulam
    Rashwan, Mohsen
    Algabri, Mohammed
    [J]. IEEE ACCESS, 2021, 9 : 88405 - 88428
  • [7] Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese
    Maekawa, Kikuo
    Yamazaki, Makoto
    Maruyama, Takehiko
    Yamaguchi, Masaya
    Ogura, Hideki
    Kashino, Wakako
    Ogiso, Toshinobu
    Koiso, Hanae
    Den, Yasuharu
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1483 - 1490
  • [8] Corpus compilation: Representativeness and the CORPOBRAS
    de Oliveira, Lucia Pacheco
    Padua Dias, Maria Carmelita
    [J]. CALIDOSCOPIO, 2009, 7 (03): : 192 - 198
  • [9] Heuristic theory in corpus compilation
    Patkin, John
    [J]. JOURNAL OF ENGLISH AS A LINGUA FRANCA, 2016, 5 (02) : 333 - 354
  • [10] WEBLESP: CORPUS OF DIGITAL SPECIALISED COMMUNICATION IN SPANISH. DESIGN, COMPILATION AND USE
    Piccioni, Sara
    Pontrandolfo, Gianluca
    [J]. RLA-REVISTA DE LINGUISTICA TEORICA Y APLICADA, 2021, 59 (01): : 13 - 37