Two Novel Techniques for Space Compaction on Biological Sequences

被引:1
|
作者
Volis, George [1 ]
Makris, Christos [1 ]
Kanavos, Andreas [1 ]
机构
[1] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
关键词
Searching and Browsing; Web Information Filtering and Retrieval; Text Mining; Indexing Structures; Inverted Files; Index Compression-Gram Indexing; Sequence Analysis and Assembly; COMPUTATION;
D O I
10.5220/0005801101050112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number and size of genomic databases have grown rapidly the last years. Consequently, the number of Internet-accessible databases has been rapidly growing. Therefore there is a need for satisfactory methods for managing this growing information. A lot of effort has been put to this direction. Contributing to this effort this paper presents two algorithms which can eliminate the amount of space for storing genomic information. Our first algorithm is based on the classic n-grams/2L technique for indexing a DNA sequence and it can convert the Inverted Index of this classic algorithm to a more compressed format. Researchers have revealed the existence of repeated and palindrome patterns in DNA of living organisms. The main motivation of this technique is based on this remark and proposes an alternative data structure for handling these sequences. Our experimental results show that our algorithm can achieve a more efficient index than the n-grams/2L algorithm and can be adapted by any algorithm that is based to n-grams/2L The second algorithm is based on the n-grams technique. Perceiving the four symbols of DNA alphabet as vertex of a square scheme imprint a DNA sequence as a relation between vertices, sides and diagonals of a square. The experimental results shows that this second idea succeed even more successfully compression of our index structure.
引用
收藏
页码:105 / 112
页数:8
相关论文
共 50 条
  • [21] Functional Compaction for Functional Test Sequences
    Pomeranz, Irith
    IEEE ACCESS, 2024, 12 : 98130 - 98140
  • [23] Two novel and useful suturing techniques
    Niazi, ZBM
    PLASTIC AND RECONSTRUCTIVE SURGERY, 1997, 100 (06) : 1617 - 1618
  • [24] Comparing two long biological sequences using a DSM system
    Melo, RCF
    Walter, MET
    Melo, ACMA
    Batista, R
    Nardelli, M
    Martins, T
    Fonseca, T
    EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 517 - 524
  • [25] Novel techniques of graphical representation and analysis of DNA sequences - A review
    Roy, A
    Raychaudhury, C
    Nandy, A
    JOURNAL OF BIOSCIENCES, 1998, 23 (01) : 55 - 71
  • [26] A Novel Image Cryptosystem Inspired by the Generation of Biological Protein Sequences
    Nassef, Mohammad
    Alkinani, Monagi H.
    Shafik, Ahmed Mahmoud
    IEEE ACCESS, 2023, 11 : 29101 - 29115
  • [27] Novel techniques of graphical representation and analysis of DNA sequences—A review
    A. Roy
    C. Raychaudhury
    A. Nandy
    Journal of Biosciences, 1998, 23 : 55 - 71
  • [28] A novel algorithm for detecting multiple covariance and clustering of biological sequences
    Shen, Wei
    Li, Yan
    SCIENTIFIC REPORTS, 2016, 6
  • [29] GenericBioMatch: A novel generic pattern match algorithm for biological sequences
    Pan, YL
    Famili, AF
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 562 - 563
  • [30] A novel algorithm for detecting multiple covariance and clustering of biological sequences
    Wei Shen
    Yan Li
    Scientific Reports, 6