DNA sequence similarity search through content-based retrieval technique

被引:0
|
作者
Yeh, CH [1 ]
Sung, PY [1 ]
Chang, HT [1 ]
Kuo, CJ [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90007 USA
关键词
DNA; Peano scan; Fourier transformation; Principle Component Analysis; indexing structure; clustering algorithm; similarity retrieval;
D O I
10.1117/12.486714
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deoxyribonucleic acid (DNA) sequences are difficult to analyze similarity due to their length and complexity. The challenge lies in being able to use digital signal processing (DSP) to solve highly relevant problems in DNA sequences. Here, we transfer a one-dimensional (ID) DNA sequence into a two-dimensional (2D) pattern by using the Peano scan algorithm. Four complex values are assigned to the characters "A", ''C'', "T", and "G", respectively. Then, Fourier transform is employed to obtain far-field amplitude distribution of the 2D pattern. Hereto, a ID DNA sequence becomes a 2D image pattern. Features are extracted from the 2D image pattern with the Principle Component Analysis (PCA) method. Therefore, the DNA sequence database can be established. Unfortunately, comparing features may take a long time when the database is large since multi-dimensional features are often available. This problem is solved by building indexing structure like a filter to filter-out non-relevant items and select a subset of candidate DNA sequences. Clustering algorithms can organize the multi-dimensional feature data into the indexing structure for effective retrieval. Accordingly, the query sequence can be only compared against candidate ones rather than all sequences in database. In fact, our algorithm provides a pre-processing method to accelerate the DNA sequence search process. Finally, experimental results further demonstrate the efficiency of our proposed algorithm for DNA sequences similarity retrieval.
引用
收藏
页码:635 / 645
页数:11
相关论文
共 50 条
  • [41] Overview on subjective similarity of images for content-based medical image retrieval
    Muramatsu C.
    [J]. Radiological Physics and Technology, 2018, 11 (2) : 109 - 124
  • [42] New Similarity Measure for Illumination Invariant Content-Based Image Retrieval
    Sabeti, Leila
    Wu, Q. M. Jonathan
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2008, : 279 - 283
  • [43] A feature level fusion in similarity matching to content-based image retrieval
    Rahman, Mahmudur
    Desai, Bipin C.
    Bhattacharya, Prabir
    [J]. 2006 9TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2006, : 748 - 753
  • [44] Similarity Measures for Content-Based Dermoscopic Image Retrieval: a Comparative Study
    Belattar, Khadidja
    Mostefai, Sihem
    [J]. 2015 FIRST INTERNATIONAL CONFERENCE ON NEW TECHNOLOGIES OF INFORMATION AND COMMUNICATION (NTIC), 2015,
  • [46] A Label-Scaled Similarity Measure for Content-Based Image Retrieval
    Blanco, Gustavo
    Bedo, Marcos V. N.
    Cazzolato, Mirela T.
    Santos, Lucio F. D.
    Serafim Jorge, Ana Elisa
    Traina, Caetano, Jr.
    Azevedo-Marques, Paulo M.
    Traina, Agma J. M.
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 20 - 25
  • [47] Similarity-based online feature selection in content-based image retrieval
    Jiang, W
    Er, G
    Dai, QH
    Gu, JW
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (03) : 702 - 712
  • [48] On stability of signature-based similarity measures for content-based image retrieval
    Beecks, Christian
    Kirchhoff, Steffen
    Seidl, Thomas
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (01) : 349 - 362
  • [49] Query-sensitive similarity measure for content-based image retrieval
    Zhou, Zhi-Hua
    Dai, Hong-Bin
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 1211 - +
  • [50] Evaluation and analysis of similarity measures for content-based visual information retrieval
    Horst Eidenberger
    [J]. Multimedia Systems, 2006, 12 : 71 - 87