Text categorization based on a new classification by thresholds

被引:0
|
作者
Walid Cherif
Abdellah Madani
Mohamed Kissi
机构
[1] Rabat-Institutes,Laboratory SI2M, Department of Computer Science, National Institute of Statistics and Applied Economics
[2] University Chouaib Doukkali,Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences
[3] University Hassan II Casablanca,Laboratory LIM, Department of Computer Science, Faculty of Sciences and Technology
来源
关键词
Natural language processing; Text mining; Automated text categorization; Feature selection; Machine learning; Classification by thresholds;
D O I
暂无
中图分类号
学科分类号
摘要
Automated text categorization attempts to provide an effective solution to today’s unprecedented growth of textual data. Due to its capacity to organize a huge and varied amount of texts from which it is possible to gain invaluable insights, it has become an emerging investigative field for the research community. However, although several mathematical approaches have been studied to formalize the main components of a text categorization system: text representation, features extraction, and the classification process; such systems still face many difficulties due both to the complex nature of text databases and to the high dimensionality of texts representations. In this sense, this paper introduces an alternative way to process this problem. First, it starts by reducing the original set of features by using a newly proposed metric. And second, the added advantage of the proposed approach is that it automatically classifies a text without necessarily processing all its features. Moreover, some standard pretreatments such as stemming can be abandoned with this approach. The experimental results showed that this new text categorization method outperforms the state-of-the-art methods. As a result, the obtained f-measures on the 20 Newsgroups, BBC News, Reuters, and AG news datasets were, respectively, 95.06%, 98.21%, 88.44%, 95.70%, while standard approaches returned considerably lower scores.
引用
收藏
页码:433 / 447
页数:14
相关论文
共 50 条
  • [41] Text categorization based on topic model
    Zhou, Shibin
    Li, Kan
    Liu, Yushu
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 572 - 579
  • [42] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    Wuhan University Journal of Natural Sciences, 2006, (05) : 1335 - 1339
  • [43] Model for automatic text classification and categorization for image indexing and retrieval
    de Mello, Rodrigo Fernandes
    Bueno, Josiane Maria
    Senger, Luciano Jose
    Yang, Laurence T.
    2007 INTERNATIONAL CONFERENCE ON INTELLIGENT PERVASIVE COMPUTING, PROCEEDINGS, 2007, : 333 - +
  • [44] The Relationship of Text Categorization Using Dewey Decimal Classification Techniques
    Watthananon, Julaluk
    2014 12TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT AND KNOWLEDGE ENGINEERING), 2014, : 72 - 77
  • [45] Multi-label Classification with Clustering for Image and Text Categorization
    Nasierding, Gulisong
    Sajjanhar, Atul
    2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 869 - 874
  • [46] Text classification of student predicate use for automatic misconception categorization
    Landron-Rivera, Brian A.
    Santiago, Nayda G.
    Santiago, Aidsa
    Fernando Vega-Riveros, J.
    2018 IEEE FRONTIERS IN EDUCATION CONFERENCE (FIE), 2018,
  • [47] Support vector machines for text categorization in Chinese question classification
    Lin, Xu-Dong
    Peng, Hong
    Liu, Bo
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 334 - +
  • [48] A new approach to feature selection for text categorization
    Li, SS
    Zong, CQ
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 626 - 630
  • [49] Simple yet Effective Classification Model for Skewed Text Categorization
    Suhil, Mahamad
    Guru, D. S.
    Raju, Lavanya Narayana
    Gowda, Harsha S.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 904 - 910
  • [50] A new nearest neighbor rule for text categorization
    Gil-Garcia, Reynaldo
    Pons-Porrata, Aurora
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2006, 4225 : 814 - 823