Models and Information-Theoretic Bounds for Nanopore Sequencing

被引:26
|
作者
Mao, Wei [1 ,2 ]
Diggavi, Suhas N. [1 ]
Kannan, Sreeram [3 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90095 USA
[2] Intel Corp, Santa Clara, CA 95054 USA
[3] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
Deoxyribonucleic acid (DNA) sequencing; bioinformatics; base calling; channel with synchronization errors; deletion channel; finite state channels; CHANNELS; READS; DNA;
D O I
10.1109/TIT.2018.2809001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nanopore sequencing is an emerging new technology for sequencing Deoxyribonucleic acid (DNA), which can read long fragments of DNA (similar to 50 000 bases), in contrast to most current short-read sequencing technologies which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore, and current variations are measured. The DNA sequence is inferred from this observed current pattern using an algorithm called a base-caller. In this paper, we propose a mathematical model for the "channel" from the input DNA sequence to the observed current, and calculate bounds on the information extraction capacity of the nanopore sequencer. This model incorporates impairments, such as (non-linear) inter-symbol interference, deletions, and random response. These information bounds have two-fold application: 1) The decoding rate with a uniform input distribution can be used to calculate the average size of the plausible list of DNA sequences given an observed current trace. This bound can be used to benchmark existing base-calling algorithms, as well as serving a performance objective to design better nanopores. 2) When the nanopore sequencer is used as a reader in a DNA storage system, the storage capacity is quantified by our bounds.
引用
收藏
页码:3216 / 3236
页数:21
相关论文
共 50 条
  • [1] Models and information-theoretic bounds for nanopore sequencing
    Mao, Wei
    Diggavi, Suhas
    Kannan, Sreeram
    [J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017, : 2458 - 2462
  • [2] Structure Learning of Similar Ising Models: Information-theoretic Bounds
    Sihag, Saurabh
    Tajer, Ali
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 1307 - 1311
  • [3] Information-theoretic lower bounds for compressive sensing with generative models
    Liu, Zhaoqiang
    Scarlett, Jonathan
    [J]. IEEE Journal on Selected Areas in Information Theory, 2020, 1 (01): : 292 - 303
  • [4] Information-Theoretic Bounds for Integral Estimation
    Adams, Donald Q.
    Batik, Adarsh
    Honorio, Jean
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 742 - 747
  • [5] Intrinsic Information-Theoretic Models
    Bernal-Casas, D.
    Oller, J. M.
    [J]. ENTROPY, 2024, 26 (05)
  • [6] Strengthened Information-theoretic Bounds on the Generalization Error
    Issa, Ibrahim
    Esposito, Amedeo Roberto
    Gastpar, Michael
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 582 - 586
  • [7] Information-Theoretic Bounds for Adaptive Sparse Recovery
    Aksoylar, Cem
    Saligrama, Venkatesh
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 1311 - 1315
  • [8] Information-theoretic bounds on target recognition performance
    Jain, A
    Moulin, P
    Miller, MI
    Ramchandran, K
    [J]. AUTOMATIC TARGET RECOGNITION X, 2000, 4050 : 347 - 358
  • [9] Information-Theoretic Confidence Bounds for Reinforcement Learning
    Lu, Xiuyuan
    Van Roy, Benjamin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Information-theoretic Bounds for Differentially Private Mechanisms
    Barthe, Gilles
    Koepf, Boris
    [J]. 2011 IEEE 24TH COMPUTER SECURITY FOUNDATIONS SYMPOSIUM (CSF), 2011, : 191 - 204