Arabic text data mining: A root-based hierarchical indexing model

被引:0
|
作者
Eldos, T.M. [1 ]
机构
[1] Department of Computer Engineering, Fac. of Comp./Information Technology, Jordan Univ. of Sci. and Technology, Irbid 22110-3030, Jordan
来源
关键词
Digital libraries - Indexing (of information) - Information retrieval - Linguistics;
D O I
10.1080/02286203.2003.11442267
中图分类号
学科分类号
摘要
The world has recently witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Text data mining, as a multidisciplinary field involving information retrieval, text analysis, information extraction, clustering, categorization, linguistics, database technology, machine learning, and data mining, is becoming more significant, and efforts have been intensified in studies like information retrieval, practical applications of which are becoming more and more necessary to end users and to the scientific community itself, in order to fetch the increasingly available information efficiently. In the past few years, not only have new documents been produced directly in digital form, thus being suitable for automatic indexing, but also many of the older documents have been ported from their physical medium to the digital one. The meaning of a document is represented by a vector of features, which are weighted according to a measure that best estimate relevance. Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, and attributes dependencies. This article focuses on speeding up the information retrieval process in Arabic document base by using a root-based hierarchical indexing model. Simulation results demonstrated that speed gain in the range of 50-100 can be achieved for typical queries.
引用
下载
收藏
页码:158 / 166
相关论文
共 50 条
  • [1] An accurate Arabic root-based lemmatizer for information retrieval purposes
    El-Shishtawy, Tarek
    El-Ghannam, Fatma
    International Journal of Computer Science Issues, 2012, 9 (1 1-3): : 58 - 66
  • [2] Text mining: A survey of Arabic root extraction algorithms
    Hamza, Manar Ahmed Mohammed
    Ahmed, Tarig Mohamed
    Hilal, Anwer Mustafa Mohamedsalih
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2021, 8 (01): : 11 - 19
  • [3] Indexing Arabic texts using association rule data mining
    Haraty, Ramzi A.
    Nasrallah, Rouba
    LIBRARY HI TECH, 2019, 37 (01) : 101 - 117
  • [4] A flexible and modular data format ROOT-based implementation for HEP
    D'Urso, Domenico
    Duranti, Matteo
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [5] Text Associative Classification Approach for Mining Arabic Data Set
    Ghareb, Abdullah S.
    Hamdan, Abdul Razak
    Abu Bakar, Azuraliza
    2012 4TH CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2012, : 114 - 120
  • [6] Arabic Text Mining Using Rule Based Classification
    Thabtah, Fadi
    Gharaibeh, Omar
    Al-Zubaidy, Rashid
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2012, 11 (01)
  • [7] CAPTION TEXT EXTRACTION FOR INDEXING PURPOSES USING A HIERARCHICAL REGION-BASED IMAGE MODEL
    Leon, Miriam
    Vilaplana, Veronica
    Gasull, Antoni
    Marques, Ferran
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1869 - 1872
  • [8] Optimization and performance measurements of ROOT-based data formats in the ATLAS experiment
    Vukotic, Ilija
    INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2010): EVENT PROCESSING, 2011, 331
  • [9] The ATLAS ROOT-based data formats: recent improvements and performance measurements
    Bhimji, W.
    Cranshaw, J.
    van Gemmeren, P.
    Malon, D.
    Schaffer, R. D.
    Vukotic, I.
    INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS 2012 (CHEP2012), PTS 1-6, 2012, 396
  • [10] Data mining for hierarchical model creation
    Youngblood, G. Michael
    Cook, Diane J.
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (04): : 561 - 572