Analysis of heterogeneous genomic samples using image normalization and machine learning

被引:2
|
作者
Basodi, Sunitha [1 ]
Baykal, Pelin Icer [1 ]
Zelikovsky, Alex [1 ,2 ]
Skums, Pavel [1 ]
Pan, Yi [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, 25 Pk Pl NE, Atlanta, GA 30303 USA
[2] IM Sechenov First Moscow State Med Univ, Lab Bioinformat, Moscow 11991, Russia
关键词
Next-generation sequencing data; Image normalization; Staging HCV infections; Outbreaks investigations; Clustering; VIRAL POPULATION-STRUCTURE; C VIRUS-INFECTIONS; TRANSMISSION NETWORK; RECONSTRUCTION; EVOLUTION; VARIANTS; OUTBREAK;
D O I
10.1186/s12864-020-6661-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. Results We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. Conclusions Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Serving Machine Learning Inference Using Heterogeneous Hardware
    Li, Baolin
    Gadepally, Vijay
    Samsi, Siddharth
    Veillette, Mark
    Tiwari, Devesh
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [12] Table Recognition in Heterogeneous Documents using Machine Learning
    Rashid, Sheikh Faisal
    Akmal, Abdullah
    Adnan, Muhammad
    Aslam, Ali Adnan
    Dengel, Andreas
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 777 - 782
  • [13] A Heterogeneous Ensemble Network Using Machine Learning Techniques
    Rashid, Tarik
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (08): : 335 - 339
  • [14] Image Forgery Detection Using Machine Learning
    Janokar, Sagar
    Kulkarni, Tejas
    Kulkarni, Yash
    Kulkarni, Varad
    Kullarkar, Harshal
    Kumare, Rahul
    Kumawat, Jay
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 169 - 181
  • [15] Image Quality Enhancement Using Machine Learning
    O'Quinn, Wesley
    Haddad, Rami J.
    IEEE SOUTHEASTCON 2018, 2018,
  • [16] Determining the Appeal of an Image Using Machine Learning
    Potchen, Joe
    Lee, Daemin
    Wein, Jason
    Burns, Leland
    Hedden, Kyle
    PROCEEDINGS OF THE 2019 ANNUAL ACM SOUTHEAST CONFERENCE (ACMSE 2019), 2019, : 264 - 265
  • [17] Deep machine learning for STEM image analysis
    Nartova, Anna, V
    Matveev, Andrey, V
    Kovtunova, Larisa M.
    Okunev, Aleksey G.
    MENDELEEV COMMUNICATIONS, 2024, 34 (06) : 774 - 775
  • [18] Machine learning and image analysis in vascular surgery
    Tomihama, Roger T.
    Dass, Saharsh
    Chen, Sally
    Kiang, Sharon C.
    SEMINARS IN VASCULAR SURGERY, 2023, 36 (03) : 413 - 418
  • [19] Machine Learning Interface for Medical Image Analysis
    Zhang, Yi C.
    Kagen, Alexander C.
    JOURNAL OF DIGITAL IMAGING, 2017, 30 (05) : 615 - 621
  • [20] Digital Image Vegetation Analysis with Machine Learning
    Chen, Guang
    Liu, Yang
    Wergeles, Nickolas
    Shang, Yi
    Sartwell, Joel
    Thompson, Tom
    Lewandowski, Austin
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON ROBOTICS AND ARTIFICIAL INTELLIGENCE (ICRAI 2017), 2015, : 6 - 10