Models for Arabic Document Quality Assessment

被引:1
|
作者
Yahya, Adnan [1 ]
Ahmad, Afnan [1 ]
Assaf, Alaa [1 ]
Khater, Rawan [1 ]
Salhi, Ali [1 ]
机构
[1] Birzeit Univ, Elect & Comp Engn Dept, Birzeit, Palestine
关键词
Document quality assessment; Arabic Wikipedia; Arabic information retrieval;
D O I
10.1007/978-3-030-61146-0_24
中图分类号
F [经济];
学科分类号
02 ;
摘要
Digital content has been increasing rapidly. This content can be generated, accessed and used by anyone and thus the need for quality assessment of web content before usage becomes an important issue. Devising methods to assess the quality of Arabic digital content is the focus of this paper. Our work was partially based on Wikipedia articles annotated into featured and good according to quality guidelines of Wikipedia. Our analysis was directed at finding features that can serve as best quality indicators. Using the defined features, we trained a high accuracy quality assessment model using machine-learning algorithms. Our work went beyond the Wikipedia documents to build a general model that can assess the quality of Arabic documents that lack Wikipedia metadata with acceptable accuracy. The model was trained and built using features from documents we collected from Arabic online news sites and blogs, and annotated in collaboration with university students.
引用
收藏
页码:297 / 310
页数:14
相关论文
共 50 条
  • [1] Quality Assessment of Arabic Web Content: The case of the Arabic Wikipedia
    Yahya, Adnan
    Salhi, Ali
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2014, : 36 - 41
  • [2] Hybrid Word/Part-of-Arabic-Word Language Models For Arabic Text Document Recognition
    BenZeghiba, Mohamed Faouzi
    Louradour, Jerome
    Kermorvant, Christopher
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 671 - 675
  • [3] Document Image Quality Assessment: A Survey
    Alaei, Alireza
    Bui, Vinh
    Doermann, David
    Pal, Umapada
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (02)
  • [4] Arabic document layout analysis
    Amany M. Hesham
    Mohsen A. A. Rashwan
    Hassanin M. Al-Barhamtoshy
    Sherif M. Abdou
    Amr A. Badr
    Ibrahim Farag
    [J]. Pattern Analysis and Applications, 2017, 20 : 1275 - 1287
  • [5] Arabic document layout analysis
    Hesham, Amany M.
    Rashwan, Mohsen A. A.
    Al-Barhamtoshy, Hassanin M.
    Abdou, Sherif M.
    Badr, Amr A.
    Farag, Ibrahim
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (04) : 1275 - 1287
  • [6] Hierarchical Coherence Modeling for Document Quality Assessment
    Liao, Dongliang
    Xu, Jin
    Li, Gongfu
    Wang, Yiru
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13353 - 13361
  • [7] Quality assessment and restoration of typewritten document images
    Michael Cannon
    Judith Hochberg
    Patrick Kelly
    [J]. International Journal on Document Analysis and Recognition, 1999, 2 (2-3) : 80 - 89
  • [8] A Joint Model for Multimodal Document Quality Assessment
    Shen, Aili
    Salehi, Bahar
    Baldwin, Timothy
    Qi, Jianzhong
    [J]. 2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 107 - 110
  • [9] Document Image Quality Assessment: A Brief Survey
    Ye, Peng
    Doermann, David
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 723 - 727
  • [10] USE OF POTENTIOMETRIC TITRIMETRY IN GUM-ARABIC QUALITY ASSESSMENT
    MHINZI, G
    MOSHA, D
    [J]. JOURNAL OF THE CHEMICAL SOCIETY OF PAKISTAN, 1993, 15 (04): : 269 - 271