CLASSIFYING THE ARABIC WEB - A PILOT STUDY

被引:0
|
作者
Abdeen, M. [1 ]
Elsehemy, A. [1 ]
Nazmy, T. [1 ]
Yagoub, M. C. E. [2 ]
机构
[1] Ain Shams Univ, Fac Comp & Informat Sci, Dept Comp Sci, Cairo, Egypt
[2] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
关键词
Data mining; Web Mining; Information retrieval; Arabic Web; Text classification;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The world-wide-web has become the favorite destination of information seekers across the globe. With its massive amount of information that includes billions of web pages, information for just about any topic is a click-of-finger away. Analyzing the massive content of the web has many important aspects such as information discovering, efficient search engines and social and political patterns. Web mining techniques such as text classification and categorization are being used to provide an "under-the-microscope" picture of the web. The Arabic web represents an important portion of the web. With Arabic as the 5th most spoken language in the world and with the increasing number of Arabic Internet users at exponential rates, it is becoming important to analyze the Arabic web content and study its trends. This paper presents a close look at the content of the Arabic web. It presents the percentiles of the contents of the web in five categories, namely, politics, culture, sports, economics and religion. We used two different text classification algorithms and compared their results. We have also compared between the two text classification techniques in terms of precision and recall. The classifiers shown that the economics and politics are the highest percentiles (65% combined) while the culture and religion categories scored the lowest percentiles (about 10% combined)
引用
收藏
页码:865 / 868
页数:4
相关论文
共 50 条
  • [31] CLASSIFYING WEB PAGES WITH VISUAL FEATURES
    de Boer, Viktor
    van Someren, Maarten
    Lupascu, Tiberiu
    [J]. WEBIST 2010: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGY, VOL 1, 2010, : 245 - 252
  • [32] Classifying Web data in directory structures
    Stamou, S
    Ntoulas, A
    Krikos, V
    Kokosis, P
    Christodoulakis, D
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 238 - 249
  • [33] Web information search strategies: A model for classifying Web interaction?
    Pharo, N
    [J]. DIGITAL LIBRARIES: INTERDISCIPLINARY CONCEPTS, CHALLENGES AND OPPORTUNITIES, COLIS3 PROCEEDINGS, 1999, : 207 - 218
  • [34] A Machine Learning Approach for Classifying Offline Handwritten Arabic Words
    AlKhateeb, Jawad H.
    Ren, Jinchang
    Jiang, Jianmin
    Ipson, Stan
    [J]. 2009 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2009, : 219 - 223
  • [35] Holistic approach for classifying and retrieving personal Arabic handwritten documents
    Brook, Salama
    Al Aghbar, Zaher
    [J]. ADVANCES ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, PROCEEDINGS, 2008, : 565 - +
  • [36] Development and Validation of an Instrument Classifying Preventable Adverse Drug Events - A Pilot Study
    Eriksson, Linda Ring
    Jonsson, Anna K.
    Hakkarainen, Katja
    Hagg, Staffan
    Bradley, Thomas
    Lovborg, Henrik
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 : 99 - 100
  • [37] Problems using a web-OPAC: A pilot study
    Irgens, B
    [J]. ASIS 2000: PROCEEDINGS OF THE 63RD ASIS ANNUAL MEETING, VOL 37, 2000, 2000, 37 : 93 - 108
  • [38] The Impact of Symmetric Web-Design: A Pilot Study
    Vasseur, Aurelie
    Leger, Pierre-Majorique
    Senecal, Sylvain
    [J]. INFORMATION SYSTEMS AND NEUROSCIENCE, 2020, 32 : 173 - 180
  • [39] Digital Arabism: The Spring of the Arabic Web
    Miller, Catherine
    [J]. REVUE DES MONDES MUSULMANS ET DE LA MEDITERRANEE, 2014, 135
  • [40] Conceptual Search for Arabic Web Content
    Al-Zoghby, Aya M.
    Shaalan, Khaled
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 405 - 416