Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data

被引:132
|
作者
Jarvis, RM [1 ]
Goodacre, R [1 ]
机构
[1] Univ Manchester, Dept Chem, Manchester M60 1QD, Lancs, England
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
D O I
10.1093/bioinformatics/bti102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently, there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result, the opportunity to obtain new knowledge from such data is lost. Methods: We used genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA). Results: The GA selects sensible pre-processing steps from a total of similar to 10(10) possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed; thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.
引用
收藏
页码:860 / 868
页数:9
相关论文
共 50 条
  • [1] Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils
    Devos, Olivier
    Downey, Gerard
    Duponchel, Ludovic
    [J]. FOOD CHEMISTRY, 2014, 148 : 124 - 130
  • [2] Fuzzy-Genetic Algorithm for pre-processing the data at RTU
    Kumar, P
    Chandna, V
    Chandna, V
    Thomas, M
    [J]. 2004 IEEE POWER ENGINEERING SOCIETY GENERAL MEETING, VOLS 1 AND 2, 2004, : 1068 - 1068
  • [3] Fuzzy-genetic algorithm for pre-processing data at the RTU
    Kumar, P
    Chandna, VK
    Thomas, MS
    [J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 2004, 19 (02) : 718 - 723
  • [4] Parallel genetic algorithm co-optimization of spectral pre-processing and wavelength selection for PLS regression
    Devos, Olivier
    Duponchel, Ludovic
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 107 (01) : 50 - 58
  • [5] Variable selection and data pre-processing in NN modelling of complex chemical processes
    Papadokonstantakis, S
    Machefer, S
    Schnitzlein, K
    Lygeros, AI
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2005, 29 (07) : 1647 - 1659
  • [6] OPLS methodology for analysis of pre-processing effects on spectroscopic data
    Gabrielsson, Jon
    Jonsson, Hans
    Airiau, Christian
    Schmidt, Bernd
    Escott, Richard
    Trygg, Johan
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 84 (1-2) : 153 - 158
  • [7] Pre-Processing Optimization of RNA Immunoprecipitation Microarray Data
    Barreto-Hernadez, Emiliano
    Gama-Carvalho, Margarida
    Sousa, Lisete
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (10) : 1319 - 1328
  • [8] Data Pre-Processing by Genetic Algorithms for Bankruptcy Prediction
    Tsai, Chih-Fong
    Chou, Jui-Sheng
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2011, : 1780 - 1783
  • [9] A genetic algorithm-based approach for pre-processing metabolomics and lipidomics LC–MS data
    Hock Chuan Yeo
    Bevan Kai-Sheng Chung
    William Chong
    Ju Xin Chin
    Kok Siong Ang
    Meiyappan Lakshmanan
    Ying Swan Ho
    Dong-Yup Lee
    [J]. Metabolomics, 2016, 12
  • [10] Automated algorithm for improved pre-processing of magnetic relaxometry data
    Stefan, W.
    Mathieu, K.
    Thrower, S. L.
    Fuentes, D.
    Kaffes, C.
    Sovizi, J.
    Hazle, J. D.
    [J]. MEDICAL IMAGING 2018: PHYSICS OF MEDICAL IMAGING, 2018, 10573