Arabic text data mining: A root-based hierarchical indexing model

被引:0
|
作者
Eldos, T.M. [1 ]
机构
[1] Department of Computer Engineering, Fac. of Comp./Information Technology, Jordan Univ. of Sci. and Technology, Irbid 22110-3030, Jordan
来源
关键词
Digital libraries - Indexing (of information) - Information retrieval - Linguistics;
D O I
10.1080/02286203.2003.11442267
中图分类号
学科分类号
摘要
The world has recently witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Text data mining, as a multidisciplinary field involving information retrieval, text analysis, information extraction, clustering, categorization, linguistics, database technology, machine learning, and data mining, is becoming more significant, and efforts have been intensified in studies like information retrieval, practical applications of which are becoming more and more necessary to end users and to the scientific community itself, in order to fetch the increasingly available information efficiently. In the past few years, not only have new documents been produced directly in digital form, thus being suitable for automatic indexing, but also many of the older documents have been ported from their physical medium to the digital one. The meaning of a document is represented by a vector of features, which are weighted according to a measure that best estimate relevance. Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, and attributes dependencies. This article focuses on speeding up the information retrieval process in Arabic document base by using a root-based hierarchical indexing model. Simulation results demonstrated that speed gain in the range of 50-100 can be achieved for typical queries.
引用
下载
收藏
页码:158 / 166
相关论文
共 50 条
  • [21] Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (01) : 105 - 115
  • [22] Different Data Mining Approaches Based Medical Text Data
    Xiao, Wenke
    Jing, Lijia
    Xu, Yaxin
    Zheng, Shichao
    Gan, Yanxiong
    Wen, Chuanbiao
    JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021
  • [23] Data mining for text categorization with semi-supervised agglomerative hierarchical clustering
    Skarmeta, AG
    Bensaid, A
    Tazi, N
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2000, 15 (07) : 633 - 646
  • [25] A Concurrent Dual-Band Square Root-Based Model Based on Memory Polynomial for Closely Spaced Signals
    Fan, Chuanchao
    Yu, Cuiping
    Liu, Yuanan
    2019 IEEE MTT-S INTERNATIONAL WIRELESS SYMPOSIUM (IWS 2019), 2019,
  • [26] Towards an Arabic Text Summaries Evaluation Based on AraBERT Model
    Ellouze, Samira
    Jaoua, Maher
    RESEARCH CHALLENGES IN INFORMATION SCIENCE, 2022, 446 : 57 - 69
  • [27] ARABIC TEXT DETECTION IN VIDEOS USING NEURAL AND BOOSTING-BASED APPROACHES: APPLICATION TO VIDEO INDEXING
    Yousfi, Sonia
    Berrani, Sid-Ahmed
    Garcia, Christophe
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3028 - 3032
  • [28] Arabic text-based Video indexing and retrieval system enhanced by semantic content and relevance feedback
    Hamroun, Mohamed
    Lajmi, Sonia
    Nicolas, Henri
    Amous, Ikram
    2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [29] Text and image compression based on data mining perspective
    Oswald C.
    Sivaselvan B.
    Data Science Journal, 2018, 17
  • [30] Personal health indexing based on medical examinations: A data mining approach
    Chen, Ling
    Li, Xue
    Yang, Yi
    Kurniawati, Hanna
    Sheng, Quan Z.
    Hu, Hsiao-Yun
    Huang, Nicole
    DECISION SUPPORT SYSTEMS, 2016, 81 : 54 - 65