Abstracting for Dimensionality Reduction in Text Classification

被引:1
|
作者
McAllister, Richard A. [1 ]
Angryk, Rafal A. [1 ]
机构
[1] Montana State Univ, Dept Comp Sci, Bozeman, MT 59717 USA
关键词
D O I
10.1002/int.21543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a growing interest in efficient models of text mining and an emergent need for new data structures that address word relationships. Detailed knowledge about the taxonomic environment of keywords that are used in text documents can provide valuable insight into the nature of the subject matter contained therein. Such insight may be used to enhance the data structures used in the text data mining task as relationships become usefully apparent. A popular scalable technique used to infer these relationships, while reducing dimensionality, has been Latent Semantic Analysis. We present a new approach, which uses an ontology of lexical abstractions to create abstraction profiles of documents and uses these profiles to perform text organization based on a process that we call frequent abstraction analysis. We introduce TATOO, the Text Abstraction TOOlkit, which is a full implementation of this new approach. We present our data model via an example of how taxonomically derived abstractions can be used to supplement semantic data structures for the text classification task. (C) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:115 / 138
页数:24
相关论文
共 50 条
  • [1] Taxonomic Dimensionality Reduction in Bayesian Text Classification
    McAllister, Richard
    Sheppard, John
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 508 - 513
  • [2] Dimensionality Reduction by Mutual Information for Text Classification
    刘丽珍
    宋瀚涛
    陆玉昌
    Journal of Beijing Institute of Technology(English Edition), 2005, (01) : 32 - 36
  • [3] A Comparative Approach of Dimensionality Reduction Techniques in Text Classification
    Basha, Shaik Rahamat
    Rani, J. Keziya
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2019, 9 (06) : 4974 - 4979
  • [4] Dimensionality reduction in text classification using scatter method
    Saarikoski, Jyri
    Laurikkala, Jorma
    Jarvelin, Kalervo
    Siermala, Markku
    Juhola, Martti
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 1 - 21
  • [5] An approach to text classification using dimensionality reduction and combination of classifiers
    Jain, G
    Ginwala, A
    Aslandogan, YA
    PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 564 - 569
  • [6] An Efficient Approach for Dimensionality Reduction and Classification of High Dimensional Text Documents
    Kumar, Kotte Vinay
    Srinivasan, R.
    Singh, E. B.
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,
  • [7] A method of dimensionality reduction by selection of components in principal component analysis for text classification
    Zhang, Yangwu
    Li, Guohe
    Zong, Heng
    FILOMAT, 2018, 32 (05) : 1499 - 1506
  • [8] POST-PROCESSING AND DIMENSIONALITY REDUCTION FOR EXTREME LEARNING MACHINE IN TEXT CLASSIFICATION
    Trusca, Maria Mihaela
    Aldea, Anamaria
    Gradinaru, Simona Elena
    Albu, Crisan
    ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2021, 55 (04): : 37 - 50
  • [9] An effective dimensionality reduction method for text classification based on TFP-tree
    Liu, Lu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1893 - 1905
  • [10] Exploration of dimensionality reduction for text visualization
    Huang, SP
    Ward, MO
    Rundensteiner, EA
    THIRD INTERNATIONAL CONFERENCE ON COORDINATED & MULTIPLE VIEWS IN EXPLORATORY VISUALIZATION, PROCEEDINGS, 2005, : 63 - 74