Efficient Feature Selection and Domain Relevance Term Weighting Method for Document Classification

被引:5
|
作者
Khan, Aurangzeb [1 ]
Baharudin, Baharum [1 ]
Khan, Khairullah [1 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Tronoh, Perak, Malaysia
关键词
Feature selection; Text classification; Ontology; Feature vector; TEXT;
D O I
10.1109/ICCEA.2010.228
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature selection is of paramount concern in document classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the "Bag of Word" BOW of the documents with term weighting phenomena. Documents representing through this model has some limitations that is, ignoring term dependencies, structure and ordering of the terms in documents. To overcome this problem semantic base feature vector is proposed. That is used to extracts the concept of term, co-occurring and associated terms using ontology. The proposed method is applied on small documents dataset, which shows that this method outperforms then term frequency/inverse document frequency (TF-IDF) with BOW feature selection method for text classification.
引用
收藏
页码:398 / 403
页数:6
相关论文
共 50 条
  • [1] Domain relevance on term weighting
    Brunzel, Marko
    Spiliopoulou, Myra
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4592 : 427 - +
  • [2] Feature Selection and Term Weighting
    Algarni, Abdulmohsen
    Tairan, Nasser
    [J]. 2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2014, : 336 - 339
  • [3] Topical Term Weighting based on Extended Random Sets for Relevance Feature Selection
    Alharbi, Abdullah Semran
    Li, Yuefeng
    Xu, Yue
    [J]. 2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 654 - 661
  • [4] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [5] An Efficient Feature Selection Method for Activity Classification
    Zhang, Shumei
    McCullagh, Paul
    Callaghan, Vic
    [J]. 2014 INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS (IE), 2014, : 16 - 22
  • [6] An improved term weighting method based on relevance frequency for text classification
    Li, Chuanxiao
    Li, Wenqiang
    Tang, Zhong
    Li, Song
    Xiang, Hai
    [J]. SOFT COMPUTING, 2023, 27 (07) : 3563 - 3579
  • [7] An improved term weighting method based on relevance frequency for text classification
    Chuanxiao Li
    Wenqiang Li
    Zhong Tang
    Song Li
    Hai Xiang
    [J]. Soft Computing, 2023, 27 : 3563 - 3579
  • [8] A simple and efficient filter feature selection method via document-term matrix unitization
    Li, Qing
    Zhao, Shuai
    He, Tengjiao
    Wen, Jinming
    [J]. PATTERN RECOGNITION LETTERS, 2024, 181 : 23 - 29
  • [9] Customized term weighting scheme for document classification
    Benjamin, C. M. X.
    Woon, W. L.
    Wong, K. S. D.
    [J]. 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING, VOLS 1-3, 2008, : 294 - 299
  • [10] Novel and efficient method on feature selection and data classification
    Chen, Tieming
    Ma, Jixia
    Huang, Samuel H.
    Cai, Jiamei
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2012, 49 (04): : 735 - 745