Maximizing information content in feature extraction

被引:15
|
作者
Padmanabhan, M [1 ]
Dharanipragada, S
机构
[1] Renaissance Technol, E Setauket, NY 11733 USA
[2] Citadel Investment Grp, Chicago, IL 60603 USA
来源
关键词
classifiers; optimal feature projections; optimum feature extraction; penalized mutual information; speech recognition;
D O I
10.1109/TSA.2005.848876
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we consider the problem of quantifying the amount of information contained in a set of features, to discriminate between various classes. We explore these ideas in the context of a speech recognition system, where an important classification sub-problem is to predict the phonetic class, given an observed acoustic feature vector. The connection between information content and speech recognition system performance is first explored in the context of various feature extraction schemes used in speech recognition applications. Subsequently, the idea of optimizing the information content to improve recognition accuracy is generalized to a linear projection of the underlying features. We show that several prior methods to compute linear transformations (such as linear/heteroscedastic discriminant analysis) can be interpreted in this general framework of maximizing the information content. Subsequently, we extend this reasoning and propose a new objective function to maximize a penalized mutual information (pMI) measure. This objective function is seen to be very well correlated with the word error rate of the final system. Finally experimental results are provided that show that the proposed pMI projection consistently outperforms other methods for a variety of cases, leading to relative improvements in the word error rate of 5%-16% over earlier methods.
引用
收藏
页码:512 / 519
页数:8
相关论文
共 50 条
  • [21] A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously
    Xu, Jianhua
    Liu, Jiali
    Yin, Jing
    Sun, Chengyu
    KNOWLEDGE-BASED SYSTEMS, 2016, 98 : 172 - 184
  • [22] Maximizing Mutual Information Across Feature and Topology Views for Representing Graphs
    Fan, Xiaolong
    Gong, Maoguo
    Wu, Yue
    Li, Hao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10735 - 10747
  • [23] Feature selection via maximizing global information gain for text classification
    Shang, Changxing
    Li, Min
    Feng, Shengzhong
    Jiang, Qingshan
    Fan, Jianping
    KNOWLEDGE-BASED SYSTEMS, 2013, 54 : 298 - 309
  • [24] Web Information Extraction for content augmentation
    Janevski, A
    Dimitrova, N
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A389 - A392
  • [25] Image information content and extraction techniques
    Ekblad, U
    Kinser, JM
    Atmer, J
    Zetterlund, N
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2004, 525 (1-2): : 397 - 401
  • [26] Kernel PCA for feature extraction with information complexity
    Liu, ZQ
    Bozdogan, H
    STATISTICAL DATA MINING AND KNOWLEDGE DISCOVERY, 2004, : 309 - 322
  • [27] FEATURE INFORMATION EXTRACTION FROM DYNAMIC BIOSPECKLE
    ZHENG, B
    PLEASS, CM
    IH, CS
    APPLIED OPTICS, 1994, 33 (02): : 231 - 237
  • [28] Fisher Information Analysis for Matching Feature Extraction
    Pei, Zhijun
    Zhang, Ping
    Sun, Shoumei
    Gu, Jinqing
    2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 425 - 428
  • [29] Feature extraction using supervised Independent Component Analysis by maximizing class distance
    Sakaguchi, Y
    Ozawa, S
    Kotani, M
    ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 2502 - 2506
  • [30] Incorporating corner information for mouth feature extraction
    Tan, Hua-Chun
    Zhang, Yu-Jin
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2909 - +