WindowMasker:: window-based masker for sequenced genomes

被引:190
|
作者
Morgulis, A [1 ]
Gertz, EM [1 ]
Schäffer, AA [1 ]
Agarwala, R [1 ]
机构
[1] Natl Ctr Biotechnol Informat, Natl Inst Hlth, Dept Hlth & Human Serv, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bti774
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Matches to repetitive sequences are usually undesirable in the output of DNA database searches. Repetitive sequences need not be matched to a query, if they can be masked in the database. RepeatMasker/Maskeraid (RM), currently the most widely used software for DNA sequence masking, is slow and requires a library of repetitive template sequences, such as a manually curated RepBase library, that may not exist for newly sequenced genomes. Results: We have developed a software tool called WindowMasker (WM) that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. We validate WM by comparing BLAST outputs from large sets of queries applied to two versions of the same genome, one masked by WM, and the other masked by RM. Even for genomes such as the human genome, where a good RepBase library is available, searching the database as masked with WM yields more matches that are apparently non-repetitive and fewer matches to repetitive sequences. We show that these results hold for transcribed regions as well. WM also performs well on genomes for which much of the sequence was in draft form at the time of the analysis.
引用
收藏
页码:134 / 141
页数:8
相关论文
共 50 条
  • [41] High-transparency clear window-based agrivoltaics
    Vasiliev, Mikhail
    Rosenberg, Victor
    Goodfield, David
    Lyford, Jamie
    Li, Chengdao
    Sustainable Buildings, 2023, 6
  • [42] Dynamics of window-based network congestion control system
    Yang, Hong-yong
    Wang, Fu-sheng
    Zhu, Xun-lin
    Zhang, Si-ying
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 249 - +
  • [43] The Devil Is in the Details: Window-based Attention for Image Compression
    Zou, Renjie
    Song, Chunfeng
    Zhang, Zhaoxiang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17471 - 17480
  • [44] 2 BACTERIAL GENOMES SEQUENCED
    不详
    HUMAN GENOME NEWS, 1995, 7 (01) : 5 - 5
  • [45] NUMTs in sequenced eukaryotic genomes
    Richly, E
    Leister, D
    MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (06) : 1081 - 1084
  • [46] Incremental window-based protein sequence alignment algorithms
    Rangwala, Huzefa
    Karypis, George
    BIOINFORMATICS, 2007, 23 (02) : E17 - E23
  • [47] DBXTOOL - A WINDOW-BASED SYMBOLIC DEBUGGER FOR SUN WORKSTATIONS
    ADAMS, E
    MUCHNICK, SS
    SOFTWARE-PRACTICE & EXPERIENCE, 1986, 16 (07): : 653 - 669
  • [48] Window-Based Greedy Contention Management for Transactional Memory
    Sharma, Gokarna
    Estrade, Brett
    Busch, Gostas
    DISTRIBUTED COMPUTING, 2010, 6343 : 64 - +
  • [49] Dual window-based anomaly detection for hyperspectral imagery
    Kwon, H
    Der, SZ
    Nasrabadi, NM
    AUTOMATIC TARGET RECOGNITION XIII, 2003, 5094 : 148 - 158
  • [50] A window-based automatic hardware/software partitioning heuristic
    Parandeh-Afshar, Hadi
    Yousefpour, Mohsen
    Tootoonchian, Ali
    Hashemi, Mahmoud Reza
    Fatemi, Omid
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2007, 32 (2C) : 27 - 40