Knowledge Based Dimensionality Reduction for Technical Text Mining

被引:0
|
作者
Shalaby, Walid [1 ]
Zadrozny, Wlodek [1 ]
Gallagher, Sean [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Comp Sci, Charlotte, NC 28223 USA
关键词
Dimensionality Reduction; Feature Selection; Text Classification; Patent Classification; Knowledge Bases;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose a novel technique for dimensionality reduction using freely available online knowledge bases. The complexity of our method is linearly proportional to the size of the full feature set, making it applicable efficiently to huge and complex datasets. We demonstrate this approach by investigating its effectiveness on patent data, the largest free technical text. We report empirical results on classification of the CLEF-IP 2010 dataset using bigram features supported by mentions in Wikipedia, Wiktionary, and GoogleBooks knowledge bases. We achieve a 13-fold reduction in number of bigrams features and a 1.7% increase in classification accuracy over the unigrams baseline. These results give concrete evidence that significant accuracy improvements and massive reduction in dimensionality could be achieved using our approach, hence help alleviating the tradeoff between task complexity and accuracy.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Building a Knowledge Based Summarization System for Text Data Mining
    Timofeyev, Andrey
    Choi, Ben
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2018, 2018, 11015 : 118 - 133
  • [22] An effective dimensionality reduction method for text classification based on TFP-tree
    Liu, Lu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1893 - 1905
  • [23] Mutual information based reduction of data mining dimensionality in gene expression analysis
    Marohnic, V
    Debeljak, E
    Bogunovic, N
    ITI 2004: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2004, : 249 - 254
  • [24] Frequent item sets based dimensionality reduction algorithm in data mining research
    Bao Yong
    Lu Jia-yuan
    Wu Hui-zhong
    Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 1433 - 1435
  • [25] Detection of Trends of Technical Phrases in Text Mining
    Abe, Hidenao
    Tsumoto, Shusaku
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 7 - 12
  • [26] Dimensionality Reduction Approach for High Dimensional Text Documents
    Reddy, G. Suresh
    2016 INTERNATIONAL CONFERENCE ON ENGINEERING & MIS (ICEMIS), 2016,
  • [27] A Comparative Approach of Dimensionality Reduction Techniques in Text Classification
    Basha, Shaik Rahamat
    Rani, J. Keziya
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2019, 9 (06) : 4974 - 4979
  • [28] Text Dimensionality Reduction with Mutual Information Preserving Mapping
    Yang Zhen
    Yao Fei
    Fan Kefeng
    Huang Jian
    CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (05) : 919 - 925
  • [29] Text Dimensionality Reduction with Mutual Information Preserving Mapping
    YANG Zhen
    YAO Fei
    FAN Kefeng
    HUANG Jian
    ChineseJournalofElectronics, 2017, 26 (05) : 919 - 925
  • [30] SDRS: A new lossless dimensionality reduction for text corpora
    Velez de Mendizabal, Inaki
    Basto-Fernandes, Vitor
    Ezpeleta, Enaitz
    Mendez, Jose R.
    Zurutuza, Urko
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)