Knowledge Based Dimensionality Reduction for Technical Text Mining

被引:0
|
作者
Shalaby, Walid [1 ]
Zadrozny, Wlodek [1 ]
Gallagher, Sean [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Comp Sci, Charlotte, NC 28223 USA
关键词
Dimensionality Reduction; Feature Selection; Text Classification; Patent Classification; Knowledge Bases;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose a novel technique for dimensionality reduction using freely available online knowledge bases. The complexity of our method is linearly proportional to the size of the full feature set, making it applicable efficiently to huge and complex datasets. We demonstrate this approach by investigating its effectiveness on patent data, the largest free technical text. We report empirical results on classification of the CLEF-IP 2010 dataset using bigram features supported by mentions in Wikipedia, Wiktionary, and GoogleBooks knowledge bases. We achieve a 13-fold reduction in number of bigrams features and a 1.7% increase in classification accuracy over the unigrams baseline. These results give concrete evidence that significant accuracy improvements and massive reduction in dimensionality could be achieved using our approach, hence help alleviating the tradeoff between task complexity and accuracy.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] A comparison of dimensionality reduction techniques for web structure mining
    Chikhi, Nacim Fateh
    Rothenburger, Bemard
    Aussenac-Gilles, Nathalie
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 116 - 119
  • [42] Research on Organizational Knowledge Structure's Construction Based on Text Mining
    Qiu, Jiangnan
    Nian, Chuangling
    ELECTRONIC-BUSINESS INTELLIGENCE: FOR CORPORATE COMPETITIVE ADVANTAGES IN THE AGE OF EMERGING TECHNOLOGIES & GLOBALIZATION, 2010, 14 : 403 - 410
  • [43] Dimensionality Reduction for Hybrid Medical Information Opinion Mining
    Gopalakrishnan, T.
    Sengottuvelan, P.
    Bharathi, A.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2017, 23 (02): : 331 - 336
  • [44] Margin-based active learning and background knowledge in text mining
    Silva, C
    Ribeiro, B
    HIS'04: FOURTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, : 8 - 13
  • [45] Margin-based active learning and background knowledge in text mining
    Silva, C. (catarina@dei.uc.pt), IEEE Computational Intelligence Society; IEEE Systems, Man and Cybernetics; International Fuzzy Systems Association (IEEE Computer Society):
  • [46] Data mining an EEG dataset with an emphasis on dimensionality reduction
    Jahankhani, Pari
    Revett, Kenneth
    Kodogiannis, Vassilis
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 405 - 412
  • [47] Knowledge Extraction from XCSR Based on Dimensionality Reduction and Deep Generative Models
    Tadokoro, Masakazu
    Hasegawa, Satoshi
    Tatsumi, Takato
    Sato, Hiroyuki
    Takadama, Keiki
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 1883 - 1890
  • [48] Graphics and Text in the Production of Technical Knowledge in China
    Elman, Benjamin A.
    BRITISH JOURNAL FOR THE HISTORY OF SCIENCE, 2009, 42 (154): : 450 - 451
  • [49] Knowledge, Text and Practice in Ancient Technical Writing
    Wietzke, Johannes
    CLASSICAL REVIEW, 2018, 68 (02): : 562 - 565
  • [50] Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data
    Leyli-Abadi, Milad
    Labiod, Lazhar
    Nadif, Mohamed
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 801 - 813