Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

被引:0
|
作者
Hsin-Chang Yang
Chung-Hong Lee
机构
[1] Chang Jung University,Department of Information Management
[2] National Kaohsiung University of Applied Sciences,Department of Electrical Engineering
关键词
automatic category theme identification; automatic category hierarchy generation; text categorization; self-organizing maps; text mining;
D O I
暂无
中图分类号
学科分类号
摘要
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.
引用
收藏
页码:47 / 67
页数:20
相关论文
共 50 条
  • [41] Learning effective features for Chinese text categorization
    Luo, DS
    Wang, XH
    Wu, XH
    Chi, HS
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 608 - 613
  • [42] Chinese text categorization based on CCIPCA and SMO
    Li, Xin-Fu
    He, Hai-Bin
    Zhao, Lei-Lei
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2514 - 2518
  • [43] A study on feature weighting in Chinese text categorization
    Xue, DJ
    Sun, MS
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 592 - 601
  • [44] Improving Chinese text categorization by outlier learning
    Wang, XH
    Luo, DS
    Wu, XH
    Chi, HS
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 602 - 607
  • [45] Automatic identification of Chinese weblogger's interests based on text classification
    Ni, Xiaochuan
    Wu, Xiaoyuan
    Yu, Yong
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 247 - +
  • [46] An evaluation of automatic text categorization in online discussion analysis
    Lui, Andrew Kwok-Fai
    Li, Siu Cheung
    Choy, Sheung On
    [J]. 7TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES, PROCEEDINGS, 2007, : 205 - +
  • [47] Depth first rule generation for text categorization
    An, Jiyuan
    Chen, Yi-Ping Phoebe
    [J]. ADVANCES IN INTELLIGENT IT: ACTIVE MEDIA TECHNOLOGY 2006, 2006, 138 : 302 - +
  • [48] Text Categorization for Generation of a Historical Shipbuilding Ontology
    Artemova, Galina
    Boyarsky, Kirill
    Gouzevitch, Dmitri
    Gusarova, Natalia
    Dobrenko, Natalia
    Kanevsky, Eugeny
    Petrova, Daria
    [J]. KNOWLEDGE ENGINEERING AND THE SEMANTIC WEB, KESW 2014, 2014, 468 : 1 - 14
  • [49] Automatic Arabic text categorization: A comprehensive comparative study
    Hmeidi, Ismail
    Al-Ayyoub, Mahmoud
    Abdulla, Nawaf A.
    Almodawar, Abdalrahman A.
    Abooraig, Raddad
    Mahyoub, Nizar A.
    [J]. JOURNAL OF INFORMATION SCIENCE, 2015, 41 (01) : 114 - 124
  • [50] Text categorization methods for automatic estimation of verbal intelligence
    Fernandez-Martinez, Fernando
    Zablotskaya, Kseniya
    Minker, Wolfgang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9807 - 9820