A feature selection approach for automatic e-book classification based on discourse segmentation

被引:3
|
作者
Guo, Jiunn-Liang [1 ]
Wang, Hei-Chia [2 ]
Lai, Ming-Way [2 ]
机构
[1] ROC Taiwan Air Force Acad, Kaohsiung, Taiwan
[2] Natl Cheng Kung Univ, Inst Informat Management, Tainan 70101, Taiwan
关键词
Discourse segmentation; Feature selection; Text classification; Word sense disambiguation; INFORMATION; TEXT; MODEL;
D O I
10.1108/PROG-12-2012-0071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents - e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach - The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings - The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books. Research limitations/implications - Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold. Practical implications - The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents - e-books as against to conventional techniques. Originality/value - A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.
引用
收藏
页码:2 / 22
页数:21
相关论文
共 50 条
  • [31] An Evolutionary Approach to Feature Selection and Classification
    Lung, Rodica Ioana
    Suciu, Mihai-Alexandru
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT I, 2024, 14505 : 333 - 347
  • [32] A hybrid feature selection-based approach for brain tumor detection and automatic segmentation on multiparametric magnetic resonance images
    Chen, Hao
    Ban, Duo
    Qi, X. Sharon
    Pan, Xiaoying
    Qiang, Yongqian
    Yang, Qing
    MEDICAL PHYSICS, 2021, 48 (11) : 6614 - 6626
  • [33] Automatic segmentation and classification using a co-occurrence based approach
    Haddon, JF
    Schneebeli, M
    Buser, O
    IMAGING TECHNOLOGIES: TECHNIQUES AND APPLICATIONS IN CIVIL ENGINEERING, 1998, : 175 - 184
  • [34] Feature selection for automatic classification of Chinese folk songs
    Xu, Jieping
    Peng Wang
    Li Yan
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 441 - 446
  • [35] An automatic approach towards audio segmentation and classification
    Pan, Wenjuan
    Wang, Zongwu
    Liu, Zhijing
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 405 - 408
  • [36] Automatic feature selection for biological shape classification in ΣYNERGOS
    Bruno, OM
    Junior, RMC
    Consularo, LA
    Costa, LD
    SIBGRAPI '98 - INTERNATIONAL SYMPOSIUM ON COMPUTER GRAPHICS, IMAGE PROCESSING, AND VISION, PROCEEDINGS, 1998, : 363 - 370
  • [37] Automatic texture feature selection for image pixel classification
    Puig, Domenec
    Angel Garcia, Miguel
    PATTERN RECOGNITION, 2006, 39 (11) : 1996 - 2009
  • [38] Automatic Feature Selection - a hybrid statistical approach
    Murphey, YL
    Guo, H
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 382 - 385
  • [39] ENSEMBLE FEATURE SELECTION APPROACH BASED ON FEATURE RANKING FOR RICE SEED IMAGES CLASSIFICATION
    Dzi Lam Tran Tuan
    Surinwarangkoon, Thongchai
    Meethongjan, Kittikhun
    Vinh Truong Hoang
    ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2020, 18 (03) : 198 - 206
  • [40] Predicting e-book ranking based on the implicit user feedback
    Bin Cao
    Chenyu Hou
    Hongjie Peng
    Jing Fan
    Jian Yang
    Jianwei Yin
    Shuiguang Deng
    World Wide Web, 2019, 22 : 637 - 655