A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes

被引:9
|
作者
Mehmood, Tahir [1 ]
Bohlin, Jon [2 ]
Snipen, Lars [1 ]
机构
[1] Norwegian Univ Life Sci, Inst Biostat, Dept Chem Biotechnol & Food Sci, N-1430 As, Akershous, Norway
[2] Norwegian Inst Publ Hlth, Div Epidemiol, N-0403 Oslo, Norway
关键词
Partial least squares; classification; prokaryotes; GENOMES; MOTIFS; ALGORITHM; SELECTION;
D O I
10.1109/TCBB.2014.2366146
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0: 01) and SVM (p-value < 0: 01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
引用
收藏
页码:560 / 567
页数:8
相关论文
共 50 条
  • [1] Classification schemes based on Partial Least Squares for face identification
    Carlos, Gerson de Paulo
    Pedrini, Helio
    Schwartz, William Robson
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 32 : 170 - 179
  • [2] Protein family classification with partial least squares
    Opiyo, Stephen O.
    Moriyama, Etsuko N.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2007, 6 (02) : 846 - 853
  • [3] Classification using generalized partial least squares
    Ding, BY
    Gentleman, R
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (02) : 280 - 298
  • [4] Local semantic indexing based on partial least squares for text classification
    Zeng, Xueqiang
    Li, Guozheng
    Wang, Mingwen
    Wu, Gengfeng
    [J]. Journal of Computational Information Systems, 2008, 4 (03): : 1145 - 1152
  • [5] Personal Credit Rating Based on Partial Least Squares Regression Classification
    Dai Ting-ting
    Shan Chang-ji
    Dong Yan-shou
    Bian Yi-duo
    [J]. 2018 2ND INTERNATIONAL WORKSHOP ON RENEWABLE ENERGY AND DEVELOPMENT (IWRED 2018), 2018, 153
  • [6] Multimodal Classification of Mild Cognitive Impairment Based on Partial Least Squares
    Wang, Pingyue
    Chen, Kewei
    Yao, Li
    Hu, Bin
    Wu, Xia
    Zhang, Jiacai
    Ye, Qing
    Guo, Xiaojuan
    [J]. JOURNAL OF ALZHEIMERS DISEASE, 2016, 54 (01) : 359 - 371
  • [7] PARTIAL LEAST-SQUARES AND CLASSIFICATION AND REGRESSION TREES
    YEH, CH
    SPIEGELMAN, CH
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1994, 22 (01) : 17 - 23
  • [8] Stacked Partial Least Squares Regression for Image Classification
    Hasegawa, Ryoma
    Hotta, Kazuhiro
    [J]. PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 765 - 769
  • [9] Tensor partial least squares for hyperspectral image classification
    Okwuashi, Onuwa
    Ndehedehe, Christopher E.
    Olayinka, Dupe Nihinlola
    [J]. GEOCARTO INTERNATIONAL, 2022, 37 (27) : 17487 - 17502
  • [10] Partial least squares based dimension reduction with gene selection for tumor classification
    Li, Guo-Zheng
    Zeng, Xue-Qiang
    Yang, Jack Y.
    Yang, Mary Qu
    [J]. PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1439 - +