Hybrid DIAAF/RS: Statistical Textual Feature Selection for Language-Independent Text Classification

被引:0
|
作者
Wang, Yanbo J. [1 ]
Li, Fan [1 ]
Coenen, Frans [2 ]
Sanderson, Robert [3 ]
Xin, Qin [4 ]
机构
[1] China Minsheng Banking Corp Ltd, Informat Management Ctr, Beijing, Peoples R China
[2] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
[3] Los Alamos Natl Lab, Los Alamos, NM USA
[4] Simula Res Lab, Oslo, Norway
关键词
Associative Classification; (Language-independent) Text Classification; Text Mining; Textual Feature Selection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Textual Feature Selection (TFS) is an important phase in the process of text classification. It aims to identify the most significant textual features (i.e. key words and/or phrases), in a textual dataset, that serve to distinguish between text categories. In TFS, basic techniques can be divided into two groups: linguistic vs. statistical. For the purpose of building a language-independent text classifier, the study reported here is concerned with statistical TFS only. In this paper, we propose a novel statistical TFS approach that hybridizes the ideas of two existing techniques, DIAAF (Darmstadt Indexing Approach Association Factor) and RS (Relevancy Score). With respect to associative (text) classification, the experimental results demonstrate that the proposed approach can produce greater classification accuracy than other alternative approaches.
引用
收藏
页码:222 / +
页数:3
相关论文
共 50 条
  • [1] A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
    Wang, Yanbo J.
    Coenen, Frans
    Sanderson, Robert
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 338 - +
  • [2] LiDA: Language-Independent Data Augmentation for Text Classification
    Sujana, Yudianto
    Kao, Hung-Yu
    [J]. IEEE ACCESS, 2023, 11 : 10894 - 10901
  • [3] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [4] A Framework for Language-Independent Analysis and Prosodic Feature Annotation of Text Corpora
    Spiliotopoulos, Dimitris
    Petasis, Georgios
    Kouolpetroglou, Georgios
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 517 - 524
  • [5] Intelligent Hybrid Feature Selection for Textual Sentiment Classification
    Khan, Jawad
    Alam, Aftab
    Lee, Youngmoon
    [J]. IEEE ACCESS, 2021, 9 : 140590 - 140608
  • [6] Feature Selection for Language Independent Text Forum Summarization
    Grozin, Vladislav A.
    Gusarova, Natalia F.
    Dobrenko, Natalia V.
    [J]. KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2015, 2015, 518 : 63 - 71
  • [7] A Language-Independent Text Art Extraction Method
    Suzuki, Tetsuya
    Hayashi, Kazuyuki
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 462 - +
  • [8] A Language-Independent Feature Schema for Inflectional Morphology
    Sylak-Glassman, John
    Kirov, Christo
    Yarowsky, David
    Que, Roger
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 674 - 680
  • [9] Language-Independent Quantification and Weaving for Feature Composition
    Boxleitner, Stefan
    Apel, Sven
    Kaestner, Christian
    [J]. SOFTWARE COMPOSITION, PROCEEDINGS, 2009, 5634 : 45 - +
  • [10] A Hybrid Feature Selection Method For Vietnamese Text Classification
    Nguyen Tri Hai
    Tuan Dinh Le
    Nguyen Hoang Nghia
    Vu Thanh Nguyen
    [J]. 2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96