Automated detection of cancerous genomic sequences using genomic signal processing and machine learning

被引:7
|
作者
Liu, Dong-Wei [1 ]
Jia, Run-Ping [1 ]
Wang, Cai-Feng [1 ]
Arunkumar, N. [2 ]
Narasimhan, K. [2 ]
Udayakumar, M. [3 ]
Elamaran, V. [2 ]
机构
[1] Shanghai Inst Technol, Sch Mat Sci & Engn, Shanghai 201418, Peoples R China
[2] SASTRA Deemed Univ, Sch EEE, Thanjavur, India
[3] SASTRA Deemed Univ, Sch Chem & Biotechnol, Dept Bioinformat, Thanjavur, India
关键词
Genomic signal processing; Discrete wavelet transform; Cancer; Support vector machine; Gene sequence; Differentiation; Signal processing;
D O I
10.1016/j.future.2018.12.041
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Missense mutations are the primary cause of cancer. Identification of mutation in gene sequences is the preliminary step in diagnosis of cancer. In order to identify mutation we need to differentiate between cancerous and non-cancerous gene sequences. Identification of mutation by sequence comparison method can only be possible if the existing variant repeats. If there are no homologous variants present, using a sequence identification method, it is difficult to distinguish cancerous and non-cancerous sequences. Here we have used DWT based Genomic Signal Processing techniques to identify a pattern in the characteristics of the sequences, which in turn can be used with machine learning algorithm to differentiate between cancerous and non-cancerous sequences. The cancerous and non-cancerous gene sequences for lung cancer, breast cancer and ovarian cancer are obtained from NCBI. After performing numerical mapping for the sequences, four level DWT is applied using Haar wavelet and statistical features like mean, median, standard deviation, inter quartile range, skewness and kurtosis are obtained from the wavelet domain. These statistical values when applied to machine learning algorithms resulted in the accuracy of 100% on classification of cancerous and non-cancerous sequences with Support Vector Machine. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:233 / 237
页数:5
相关论文
共 50 条
  • [31] Genomic Signal Processing: Part 1
    Dougherty, Edward R.
    Cai, Xiaodong
    Huang, Yufei
    Kim, Seungchan
    Yamaguchi, Rui
    [J]. CURRENT GENOMICS, 2009, 10 (06) : 364 - 364
  • [32] Future Perspectives on Automated Machine Learning in Biomedical Signal Processing
    Lopez-Ramos, Luis Miguel
    [J]. INTELLIGENT TECHNOLOGIES AND APPLICATIONS, 2022, 1616 : 159 - 170
  • [33] EGID: an ensemble algorithm for improved genomic island detection in genomic sequences
    Che, Dongsheng
    Hasan, Mohammad Shabbir
    Wang, Han
    Fazekas, John
    Huang, Jinling
    Liu, Qi
    [J]. BIOINFORMATION, 2011, 7 (06) : 311 - 314
  • [34] DeepCOVID-19: A model for identification of COVID-19 virus sequences with genomic signal processing and deep learning
    Adetiba, Emmanuel
    Abolarinwa, Joshua A.
    Adegoke, Anthony A.
    Taiwo, Tunmike B.
    Ajayi, Oluwaseun T.
    Abayomi, Abdultaofeek
    Adetiba, Joy N.
    Badejo, Joke A.
    [J]. COGENT ENGINEERING, 2022, 9 (01):
  • [35] Detection of Anomalous Behavior of Smartphones Using Signal Processing and Machine Learning Techniques
    James, R. Soundar Raja
    Albasir, A.
    Naik, K.
    Dabbagh, M. Y.
    Dash, P.
    Zaman, M.
    Goel, N.
    [J]. 2017 IEEE 28TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2017,
  • [36] Detection of Major Depressive Disorder using Signal Processing and Machine Learning Approaches
    Saleque, Shahriar
    Spriha, Gul-A-Zannat
    Kamal, Rasheeq Ishraq
    Khan, Rafia Tabassum
    Chakrabarty, Amitabha
    Parvez, Mohammad Zavid
    [J]. PROCEEDINGS OF THE 15TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2020), 2020, : 1032 - 1037
  • [37] Automated recognition of retroviral sequences in genomic data -: RetroTector©
    Sperber, Goran O.
    Airola, Tove
    Jern, Patric
    Blomberg, Jonas
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 (15) : 4964 - 4976
  • [38] Genomic Prediction of Wheat Grain Yield Using Machine Learning
    Sirsat, Manisha Sanjay
    Oblessuc, Paula Rodrigues
    Ramiro, Ricardo S.
    [J]. AGRICULTURE-BASEL, 2022, 12 (09):
  • [39] Using Visualization to Illustrate Machine Learning Models for Genomic Data
    Qu, Zhonglin
    Zhou, Yi
    Quang Vinh Nguyen
    Catchpoole, Daniel R.
    [J]. PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2019), 2019,
  • [40] Genomic Signal Processing for Variant Detection in Diploid Parent-Child Trios
    Spence, Melissa
    Banuelos, Mario
    Marcia, Roummel E.
    Sindi, Suzanne
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1318 - 1322