Machine learning in automated text categorization

被引:4291
|
作者
Sebastiani, F [1 ]
机构
[1] CNR, Ist Elaboraz Informaz, I-56124 Pisa, Italy
关键词
algorithms; experimentation; theory; machine learning; text categorization; text classification;
D O I
10.1145/505282.505283
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
引用
收藏
页码:1 / 47
页数:47
相关论文
共 50 条
  • [1] Machine learning for Arabic text categorization
    Duwairi, Rehab M.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010
  • [2] AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION
    APTE, C
    DAMERAU, F
    WEISS, SM
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) : 233 - 251
  • [3] Automated text categorization using support vector machine
    Kwok, JTY
    [J]. ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3, 1998, : 347 - 351
  • [4] Text Categorization with Machine Learning and Hierarchical Structures
    Krendzelak, M.
    Jakab, F.
    [J]. 2015 13TH INTERNATIONAL CONFERENCE ON EMERGING ELEARNING TECHNOLOGIES AND APPLICATIONS (ICETA), 2015, : 213 - 217
  • [5] Machine Learning Methods for Medical Text Categorization
    Zhang, Qirui
    Tan, Jinghua
    Zhou, Huaying
    Tao, Weiye
    He, Kejing
    [J]. PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 494 - +
  • [6] Machine learning for text categorization: Background and characteristics
    Lewis, DD
    [J]. NATIONAL ONLINE MEETING, PROCEEDINGS 2000, 2000, : 221 - 226
  • [7] Automated Text Categorization
    Patel, Atul
    Pathak, Samprati
    Khan, Md Irfan
    [J]. ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 16 - 20
  • [8] Arabic Text Categorization using Machine Learning Approaches
    Alshammari, Riyad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (03) : 226 - 230
  • [9] Text categorization based on regularization extreme learning machine
    Wenbin Zheng
    Yuntao Qian
    Huijuan Lu
    [J]. Neural Computing and Applications, 2013, 22 : 447 - 456
  • [10] Text categorization based on regularization extreme learning machine
    Zheng, Wenbin
    Qian, Yuntao
    Lu, Huijuan
    [J]. NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 447 - 456