Text categorization based on a new classification by thresholds

被引:0
|
作者
Walid Cherif
Abdellah Madani
Mohamed Kissi
机构
[1] Rabat-Institutes,Laboratory SI2M, Department of Computer Science, National Institute of Statistics and Applied Economics
[2] University Chouaib Doukkali,Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences
[3] University Hassan II Casablanca,Laboratory LIM, Department of Computer Science, Faculty of Sciences and Technology
来源
关键词
Natural language processing; Text mining; Automated text categorization; Feature selection; Machine learning; Classification by thresholds;
D O I
暂无
中图分类号
学科分类号
摘要
Automated text categorization attempts to provide an effective solution to today’s unprecedented growth of textual data. Due to its capacity to organize a huge and varied amount of texts from which it is possible to gain invaluable insights, it has become an emerging investigative field for the research community. However, although several mathematical approaches have been studied to formalize the main components of a text categorization system: text representation, features extraction, and the classification process; such systems still face many difficulties due both to the complex nature of text databases and to the high dimensionality of texts representations. In this sense, this paper introduces an alternative way to process this problem. First, it starts by reducing the original set of features by using a newly proposed metric. And second, the added advantage of the proposed approach is that it automatically classifies a text without necessarily processing all its features. Moreover, some standard pretreatments such as stemming can be abandoned with this approach. The experimental results showed that this new text categorization method outperforms the state-of-the-art methods. As a result, the obtained f-measures on the 20 Newsgroups, BBC News, Reuters, and AG news datasets were, respectively, 95.06%, 98.21%, 88.44%, 95.70%, while standard approaches returned considerably lower scores.
引用
收藏
页码:433 / 447
页数:14
相关论文
共 50 条
  • [21] Adapting Neural Text Classification for Improved Software Categorization
    LeClair, Alexander
    Eberhart, Zachary
    McMillan, Collin
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2018, : 461 - 472
  • [22] Confidence-Weighted Linear Classification for Text Categorization
    Crammer, Koby
    Dredze, Mark
    Pereira, Fernando
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 1891 - 1926
  • [23] Personalized news categorization through scalable text classification
    Antonellis, L
    Bouras, C
    Poulopoulos, V
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 391 - 401
  • [24] Text categorization based on k-nearest neighbor approach for Web site classification
    Kwon, OW
    Lee, JH
    INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 25 - 44
  • [25] On a New Model for Automatic Text Categorization Based on Vector Space Model
    Suzuki, Makoto
    Yamagishi, Naohide
    Ishidat, Takashi
    Gotot, Masayuki
    Hirasawa, Shigeichi
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, : 3152 - 3159
  • [26] A New Text Categorization Method Based on SVD and Cascade Correlation Algorithm
    Wang, Yan Xia
    Deng, Wei
    2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL III, PROCEEDINGS, 2009, : 57 - 60
  • [27] On a new model for automatic text categorization based on vector space model
    Faculty of Information Science, Shonan Institute of Technology, 1-1-25 Tsujido Nishikaigan, Fujisawa, Kanagawa, 251-8511, Japan
    不详
    不详
    Conf. Proc. IEEE Int. Conf. Syst. Man Cybern., 2010, (3152-3159):
  • [28] New Feature Selection Methods Based on Context Similarity for Text Categorization
    Chen, Yifei
    Han, Bingqing
    Hou, Ping
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 598 - 604
  • [29] Kernel-based text categorization
    Jalam, R
    Teytaud, O
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1891 - 1896
  • [30] Research of text categorization based on SVM
    Wang, Meihua
    Zhang, Hongbin
    Ding, Renshuang
    2010 INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT (CCCM2010), VOL I, 2010, : 676 - 679