Random forest with Random projection to impute missing gene expression data

被引:4
|
作者
Gondara, Lovedeep [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Springfield, IL 62703 USA
关键词
missing data; imputation; gene expression data; random forest; random projection; HUMAN COLORECTAL-CARCINOMA; MOLECULAR CLASSIFICATION; PREDICTION; IMPUTATION; PROFILE; CANCER; TUMOR;
D O I
10.1109/ICMLA.2015.29
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.
引用
收藏
页码:1251 / 1256
页数:6
相关论文
共 50 条
  • [1] Random forest missing data algorithms
    Tang, Fei
    Ishwaran, Hemant
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2017, 10 (06) : 363 - 377
  • [2] Biclustering gene expression data by random projection based on bucketing
    Liu, Juan
    Liu, Feng
    [J]. 2008 INTERNATIONAL SPECIAL TOPIC CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS IN BIOMEDICINE, VOLS 1 AND 2, 2008, : 322 - 325
  • [3] Leveraging random assignment to impute missing covariates in causal studies
    Kamat, Gauri
    Reiter, Jerome P.
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (07) : 1275 - 1305
  • [4] Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
    Walker, Angelica M.
    Cliff, Ashley
    Romero, Jonathon
    Shah, Manesh B.
    Jones, Piet
    Gazolla, Joao Gabriel Felipe Machado
    Jacobson, Daniel A.
    Kainer, David
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 3372 - 3386
  • [5] MISSING RESPONSE DATA - TO IMPUTE OR NOT TO IMPUTE
    BOSWICK, JM
    LEE, KL
    CALIFF, RM
    TOPOL, EJ
    [J]. CONTROLLED CLINICAL TRIALS, 1988, 9 (03): : 261 - 261
  • [6] Differential Gene Expression Data Analysis of ASD Using Random Forest
    Pragya
    Govarthan, Praveen Kumar
    Sinha, Kshitij
    Mukherjee, Sudip
    Ronickom, Jac Fredo Agastinose
    [J]. CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 1047 - 1051
  • [7] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF OPHTHALMOLOGY, 2022, 162 (01) : 138 - 139
  • [8] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 162 (01) : 138 - 139
  • [9] Distance-Based Random Forest Clustering with Missing Data
    Raniero, Matteo
    Bicego, Manuele
    Cicalese, Ferdinando
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 121 - 132
  • [10] Missing Data Imputation Through the Use of the Random Forest Algorithm
    Pantanowitz, Adam
    Marwala, Tshilidzi
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, 2009, 61 : 53 - 62