CLASSIFYING THE ARABIC WEB - A PILOT STUDY

被引:0
|
作者
Abdeen, M. [1 ]
Elsehemy, A. [1 ]
Nazmy, T. [1 ]
Yagoub, M. C. E. [2 ]
机构
[1] Ain Shams Univ, Fac Comp & Informat Sci, Dept Comp Sci, Cairo, Egypt
[2] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
关键词
Data mining; Web Mining; Information retrieval; Arabic Web; Text classification;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The world-wide-web has become the favorite destination of information seekers across the globe. With its massive amount of information that includes billions of web pages, information for just about any topic is a click-of-finger away. Analyzing the massive content of the web has many important aspects such as information discovering, efficient search engines and social and political patterns. Web mining techniques such as text classification and categorization are being used to provide an "under-the-microscope" picture of the web. The Arabic web represents an important portion of the web. With Arabic as the 5th most spoken language in the world and with the increasing number of Arabic Internet users at exponential rates, it is becoming important to analyze the Arabic web content and study its trends. This paper presents a close look at the content of the Arabic web. It presents the percentiles of the contents of the web in five categories, namely, politics, culture, sports, economics and religion. We used two different text classification algorithms and compared their results. We have also compared between the two text classification techniques in terms of precision and recall. The classifiers shown that the economics and politics are the highest percentiles (65% combined) while the culture and religion categories scored the lowest percentiles (about 10% combined)
引用
收藏
页码:865 / 868
页数:4
相关论文
共 50 条
  • [1] Classifying web documents in a hierarchy of categories: a comprehensive study
    Ceci, Michelangelo
    Malerba, Donato
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2007, 28 (01) : 37 - 78
  • [2] Classifying web documents in a hierarchy of categories: a comprehensive study
    Michelangelo Ceci
    Donato Malerba
    [J]. Journal of Intelligent Information Systems, 2007, 28 : 37 - 78
  • [3] Classifying Patients' Complaints for Regulatory Purposes: A Pilot Study
    Bouwman, Renee
    Bomhoff, Manja
    Robben, Paul
    Friele, Roland
    [J]. JOURNAL OF PATIENT SAFETY, 2021, 17 (03) : E169 - E176
  • [4] Classifying the Hungarian Web
    Kornai, A
    Krellenstein, M
    Mulligan, M
    Twomey, D
    Veress, F
    Wysoker, A
    [J]. EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 203 - 210
  • [5] Detecting and Classifying Humanitarian Crisis in Arabic Tweets
    Adel, Ghadah
    Wang, Yuping
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 269 - 274
  • [6] Classifying Arabic Text Using KNN Classifier
    Al-Badarenah, Amer
    Al-Shawakfa, Emad
    Al-Rababah, Khaleel
    Shatnawi, Safwan
    Bani-Ismail, Basel
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 259 - 268
  • [7] A Pilot Arabic CCGbank
    Boxwell, Stephen A.
    Brew, Chris
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1881 - 1888
  • [8] A METHODOLOGY FOR CLASSIFYING THE COMPLEXITY OF EXPERT SYSTEMS - A PILOT-STUDY
    MEYER, MH
    CURLEY, KF
    [J]. PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS, 1989, : 31 - 40
  • [9] A Pilot Arabic Propbank
    Palmer, Martha
    Babko-Malaya, Olga
    Bies, Ann
    Diab, Mona
    Maamouri, Mohammed
    Mansouri, Aous
    Zaghouani, Wajdi
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3467 - 3472
  • [10] Mutual intelligibility of spoken Maltese, Libyan Arabic, and Tunisian Arabic functionally tested: A pilot study
    Ceplo, Slavomir
    Batora, Jan
    Benkato, Adam
    Milicka, Jiri
    Pereira, Christophe
    Zemanek, Petr
    [J]. FOLIA LINGUISTICA, 2016, 50 (02) : 583 - 628