Historic Moments Discovery in Sequence Data

被引:1
|
作者
Bai, Ran [1 ]
Hon, Wing Kai [2 ]
Lo, Eric [3 ]
He, Zhian [4 ]
Zhu, Kenny [5 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[4] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2019年 / 44卷 / 01期
关键词
Historic moments; space optimal; prominent streaks; sequence data; SKYLINE;
D O I
10.1145/3276975
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many emerging applications are based on finding interesting subsequences from sequence data. Finding "prominent streaks," a set of the longest contiguous subsequences with values all above (or below) a certain threshold, from sequence data is one of that kind that receives much attention. Motivated from real applications, we observe that prominent streaks alone are not insightful enough but require the discovery of something we coined as "historic moments" as companions. In this article, we present an algorithm to efficiently compute historic moments from sequence data. The algorithm is incremental and space optimal, meaning that when facing new data arrival, it is able to efficiently refresh the results by keeping minimal information. Case studies show that historic moments can significantly improve the insights offered by prominent streaks alone. Furthermore, experiments show that our algorithm can outperform the baseline in both time and space.
引用
收藏
页数:33
相关论文
共 50 条
  • [21] Historic discovery of natural thermodynamic cause of cancer
    Jasiczek, Dariusz
    Kaim, Irena
    Czajkowski, Krzysztof
    NEUROENDOCRINOLOGY LETTERS, 2012, 33 (04) : 361 - 371
  • [22] Fast network discovery on sequence data via time-aware hashing
    Tara Safavi
    Chandra Sripada
    Danai Koutra
    Knowledge and Information Systems, 2019, 61 : 987 - 1017
  • [23] Discovery and classification of ecological diversity in the bacterial world: the role of DNA sequence data
    Palys, T
    Nakamura, LK
    Cohan, FM
    INTERNATIONAL JOURNAL OF SYSTEMATIC BACTERIOLOGY, 1997, 47 (04): : 1145 - 1156
  • [24] Fast network discovery on sequence data via time-aware hashing
    Safavi, Tara
    Sripada, Chandra
    Koutra, Danai
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 987 - 1017
  • [25] The Impact of Whole Genome Sequence Data on Drug Discovery—A Malaria Case Study
    Marcin P. Joachimiak
    Calvin Chang
    Philip J. Rosenthal
    Fred E. Cohen
    Molecular Medicine, 2001, 7 : 698 - 710
  • [26] The impact of whole genome sequence data on drug discovery - A malaria case study
    Joachimiak, MP
    Chang, C
    Rosenthal, PJ
    Cohen, FE
    MOLECULAR MEDICINE, 2001, 7 (10) : 698 - 710
  • [27] MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis
    Klepper, Kjetil
    Drablos, Finn
    BMC BIOINFORMATICS, 2013, 14
  • [28] MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis
    Kjetil Klepper
    Finn Drabløs
    BMC Bioinformatics, 14
  • [29] Discovery of binding motif pairs from protein complex structural data and protein interaction sequence data
    Li, H
    Li, J
    Tan, SH
    Ng, SK
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, 2003, : 312 - 323
  • [30] The discovery of the vector representation of moments and angular velocity
    Caparrini, S
    ARCHIVE FOR HISTORY OF EXACT SCIENCES, 2002, 56 (02) : 151 - 181