Analysis of heterogeneous genomic samples using image normalization and machine learning

被引:2
|
作者
Basodi, Sunitha [1 ]
Baykal, Pelin Icer [1 ]
Zelikovsky, Alex [1 ,2 ]
Skums, Pavel [1 ]
Pan, Yi [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, 25 Pk Pl NE, Atlanta, GA 30303 USA
[2] IM Sechenov First Moscow State Med Univ, Lab Bioinformat, Moscow 11991, Russia
关键词
Next-generation sequencing data; Image normalization; Staging HCV infections; Outbreaks investigations; Clustering; VIRAL POPULATION-STRUCTURE; C VIRUS-INFECTIONS; TRANSMISSION NETWORK; RECONSTRUCTION; EVOLUTION; VARIANTS; OUTBREAK;
D O I
10.1186/s12864-020-6661-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. Results We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. Conclusions Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Analysis of heterogeneous genomic samples using image normalization and machine learning
    Sunitha Basodi
    Pelin Icer Baykal
    Alex Zelikovsky
    Pavel Skums
    Yi Pan
    BMC Genomics, 21
  • [2] RICE QUALITY ANALYSIS USING IMAGE PROCESSING AND MACHINE LEARNING
    Dharmik, R. C.
    Chavhan, Sushilkumar
    Gotarkar, Shashank
    Pasoriya, Arjun
    3C TIC, 2022, 11 (02): : 158 - 164
  • [3] Boron Nanoparticle Image Analysis using Machine Learning Algorithms
    Bannigidad, Parashuram
    Potraj, Namita
    Gurubasavaraj, Prabhuodeyara M.
    Anigol, Lakkappa B.
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2022, 4 (01): : 28 - 37
  • [4] Apricot Stone Classification Using Image Analysis and Machine Learning
    Ropelewska, Ewa
    Rady, Ahmed M.
    Watson, Nicholas J.
    SUSTAINABILITY, 2023, 15 (12)
  • [5] In Vitro Data Collection Using Image Analysis and Machine Learning
    Niedz, R. P.
    IN VITRO CELLULAR & DEVELOPMENTAL BIOLOGY-ANIMAL, 2020, 56 (01) : S17 - S17
  • [6] Automating the Analysis of Genomic Damage: Instability and Genotoxicity Using AI and Machine Learning
    Tompkins, Christopher J.
    MOLECULAR THERAPY, 2023, 31 (04) : 247 - 247
  • [7] Machine learning in image analysis in ophthalmology
    dos Santos Martins, Thiago Goncalves
    Schor, Paulo
    EINSTEIN-SAO PAULO, 2021, 19 : 1 - 3
  • [8] Nonlinear analysis of shell structures using image processing and machine learning
    Nashed, M. S.
    Renno, J.
    Mohamed, M. S.
    ADVANCES IN ENGINEERING SOFTWARE, 2023, 176
  • [9] The Development of a Skin Image Analysis Tool by Using Machine Learning Algorithms
    Xiao, Perry
    Zhang, Xu
    Pan, Wei
    Ou, Xiang
    Bontozoglou, Christos
    Chirikhina, Elena
    Chen, Daqing
    COSMETICS, 2020, 7 (03)
  • [10] Forensic analysis of microtraces using image recognition through machine learning
    Rodrigues, Caio Henrique Pinke
    Sousa, Milena Dantas da Cruz
    dos Santos, Michele Avila
    Fistarol Filho, Percio Almeida
    Velho, Jesus Antonio
    Leite, Vitor Barbanti Pereira
    Bruni, Aline Thais
    MICROCHEMICAL JOURNAL, 2024, 207