Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression

被引:16
|
作者
Li, WT
Sun, FZ
Grosse, I
机构
[1] N Shore LIJ Res Inst, Robert S Boas Ctr Genom & Human Genet, Manhasset, NY 11030 USA
[2] Univ So Calif, Dept Biol Sci, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[3] Inst Plant Genet & Crop Plant Res, D-06466 Gatersleben, Germany
关键词
microarray; gene selection; extreme value distribution; logistic regression;
D O I
10.1089/1066527041410445
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, (L) over cap (D\M), and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label, (L) over cap (D-0\M). Typically, the computational burden for obtaining (L) over cap (D-0\M) is immense, often exceeding the limits of available computing resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.
引用
收藏
页码:215 / 226
页数:12
相关论文
共 50 条
  • [1] Feature Selection in Microarray Gene Expression Data Using Fisher Discriminant Ratio
    Sarbazi-Azad, Saeed
    Abadeh, Mohammad Saniee
    Abadi, Mehdi Irannejad Najaf
    [J]. 2018 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2018, : 225 - 230
  • [2] Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis
    Liu, Cheng
    Wong, Hau San
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (01) : 312 - 321
  • [3] DIF detection using logistic discriminant analysis and polytomous logistic regression
    Hidalgo, MD
    Gómez-Benito, J
    Padilla, JL
    [J]. PSICOTHEMA, 2000, 12 : 298 - 300
  • [4] Extreme value charts and analysis of means based on half logistic distribution
    Boyapati, Srinivasa
    Kantam, R.
    [J]. INTERNATIONAL JOURNAL OF QUALITY & RELIABILITY MANAGEMENT, 2012, 29 (05) : 501 - 511
  • [5] Comparison of Linear Discriminant Analysis and Logistic Regression for Data Classification
    Liong, Choong-Yeun
    Foo, Sin-Fan
    [J]. PROCEEDINGS OF THE 20TH NATIONAL SYMPOSIUM ON MATHEMATICAL SCIENCES (SKSM20): RESEARCH IN MATHEMATICAL SCIENCES: A CATALYST FOR CREATIVITY AND INNOVATION, PTS A AND B, 2013, 1522 : 1159 - 1165
  • [6] A New Filter Feature Selection Based on Criteria Fusion for Gene Microarray Data
    Ke, Wenjun
    Wu, Chunxue
    Wu, Yan
    Xiong, Neal N.
    [J]. IEEE ACCESS, 2018, 6 : 61065 - 61076
  • [7] Using Discriminant Analysis and Logistic Regression in Mapping Quaternary Sediments
    Heil, Kurt
    Schmidhalter, Urs
    [J]. MATHEMATICAL GEOSCIENCES, 2014, 46 (03) : 361 - 376
  • [8] Using Discriminant Analysis and Logistic Regression in Mapping Quaternary Sediments
    Kurt Heil
    Urs Schmidhalter
    [J]. Mathematical Geosciences, 2014, 46 : 361 - 376
  • [9] Gene selection for microarray data analysis using principal component analysis
    Wang, AT
    Gehan, EA
    [J]. STATISTICS IN MEDICINE, 2005, 24 (13) : 2069 - 2087
  • [10] Extreme Value Charts and Analysis of Means (ANOM) Based on the Log Logistic Distribution
    Rao, B. Srinivasa
    Reddy, J. Pratapa
    Babu, G. Sarath
    [J]. JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2012, 11 (02) : 493 - 505