Text categorization based on a new classification by thresholds

被引:0
|
作者
Walid Cherif
Abdellah Madani
Mohamed Kissi
机构
[1] Rabat-Institutes,Laboratory SI2M, Department of Computer Science, National Institute of Statistics and Applied Economics
[2] University Chouaib Doukkali,Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences
[3] University Hassan II Casablanca,Laboratory LIM, Department of Computer Science, Faculty of Sciences and Technology
来源
关键词
Natural language processing; Text mining; Automated text categorization; Feature selection; Machine learning; Classification by thresholds;
D O I
暂无
中图分类号
学科分类号
摘要
Automated text categorization attempts to provide an effective solution to today’s unprecedented growth of textual data. Due to its capacity to organize a huge and varied amount of texts from which it is possible to gain invaluable insights, it has become an emerging investigative field for the research community. However, although several mathematical approaches have been studied to formalize the main components of a text categorization system: text representation, features extraction, and the classification process; such systems still face many difficulties due both to the complex nature of text databases and to the high dimensionality of texts representations. In this sense, this paper introduces an alternative way to process this problem. First, it starts by reducing the original set of features by using a newly proposed metric. And second, the added advantage of the proposed approach is that it automatically classifies a text without necessarily processing all its features. Moreover, some standard pretreatments such as stemming can be abandoned with this approach. The experimental results showed that this new text categorization method outperforms the state-of-the-art methods. As a result, the obtained f-measures on the 20 Newsgroups, BBC News, Reuters, and AG news datasets were, respectively, 95.06%, 98.21%, 88.44%, 95.70%, while standard approaches returned considerably lower scores.
引用
下载
收藏
页码:433 / 447
页数:14
相关论文
共 50 条
  • [1] Text categorization based on a new classification by thresholds
    Cherif, Walid
    Madani, Abdellah
    Kissi, Mohamed
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2021, 10 (04) : 433 - 447
  • [2] A new Centroid-Based Classification model for text categorization
    Liu, Chuan
    Wang, Wenyong
    Tu, Guanghui
    Xiang, Yu
    Wang, Siyang
    Lv, Fengmao
    KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 15 - 26
  • [3] A New Fuzzy Hierarchical Classification Based on SVM for Text Categorization
    Guernine, Taoufik
    Zeroual, Kacem
    IMAGE ANALYSIS AND RECOGNITION, PROCEEDINGS, 2009, 5627 : 865 - 874
  • [4] Supervised classification by thresholds: Application to automated text categorization and opinion mining
    Cherif, Walid
    Madani, Abdellah
    Kissi, Mohamed
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (04):
  • [5] Text Classification Based on Keywords with Different Thresholds
    Tu Cam Thi Tran
    Hiep Xuan Huynh
    Phuc Quang Tran
    Dinh Quoc Truong
    PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY (ICIIT 2019), 2019, : 101 - 106
  • [6] Text Categorization Based on Regularized Linear Classification Methods
    Tong Zhang
    Frank J. Oles
    Information Retrieval, 2001, 4 : 5 - 31
  • [7] Text categorization based on regularized linear classification methods
    Zhang, T
    Oles, FJ
    INFORMATION RETRIEVAL, 2001, 4 (01): : 5 - 31
  • [8] Associative classification in text categorization
    Chen, J
    Yin, J
    Zhang, J
    Huang, J
    ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 1035 - 1044
  • [9] Text categorization based on granular agent evolutionary classification algorithm
    Pan X.
    Chen H.
    Jing Z.
    Journal of Computational and Theoretical Nanoscience, 2016, 13 (02) : 1391 - 1398
  • [10] Text categorization based on classification rules tree by frequent patterns
    Department of Computer and Information Technology, Fudan University, Shanghai 200433, China
    不详
    Ruan Jian Xue Bao, 2006, 5 (1017-1025):