Bwasw-Cloud: Efficient Sequence Alignment Algorithm for Two Big Data with MapReduce

被引:0
|
作者
Sun, Mingming [1 ]
Zhou, Xuehai [1 ]
Yang, Feng [1 ]
Lu, Kun [1 ]
Dai, Dong [2 ]
机构
[1] Univ Sci & Technol China, Comp Sci, Hefei 230026, Peoples R China
[2] Texas Tech Univ, Comp Sci, Lubbock, TX 79409 USA
基金
中国博士后科学基金; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The recent next-generation sequencing machines generate sequences at an unprecedented rate, and a sequence is not short any more called read. The reference sequences which are aligned reads against are also increasingly large. Efficiently mapping large number of long sequences with big reference sequences poses a new challenge to sequence alignment. Sequence alignment algorithms become to match on two big data. To address the above problem, we propose a new parallel sequence alignment algorithm called Bwasw-Cloud, optimized for aligning long reads against a large sequence data (e.g. the human genome). It is modeled after the widely used BWA-SW algorithm and uses the open-source Hadoop implementation of Map Reduce. The results show that Bwasw-Cloud can effectively and quickly match two big data in common cluster.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [21] MSuPDA: A Memory Efficient Algorithm for Sequence Alignment
    Mohammad Ibrahim Khan
    Md. Sarwar Kamal
    Linkon Chowdhury
    Interdisciplinary Sciences: Computational Life Sciences, 2016, 8 : 84 - 94
  • [22] An efficient sequence alignment algorithm of network traffic
    Wang, Xinghui
    Xu, Du
    PROCEEDINGS OF 2008 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL, VOLS 1 AND 2, 2008, : 1743 - 1746
  • [23] A space efficient algorithm for sequence alignment with inversions
    Gao, Y
    Wu, JF
    Niewiadomski, R
    Wang, Y
    Chen, ZZ
    Lin, GH
    COMPUTING AND COMBINATORICS, PROCEEDINGS, 2003, 2697 : 57 - 67
  • [24] An Efficient MapReduce Cube Algorithm for Varied Data Distributions
    Milo, Tova
    Altshuler, Eyal
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1151 - 1165
  • [25] PASTASpark: multiple sequence alignment meets Big Data
    Abuin, Jose M.
    Pena, Tomas F.
    Pichel, Juan C.
    BIOINFORMATICS, 2017, 33 (18) : 2948 - 2950
  • [26] Sequence-Growth : A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework
    Liang, Yen-hui
    Wu, Shiow-yang
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 393 - 400
  • [27] Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks
    Fernandez, Alberto
    del Rio, Sara
    Lopez, Victoria
    Bawakid, Abdullah
    del Jesus, Maria J.
    Benitez, Jose M.
    Herrera, Francisco
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (05) : 380 - 409
  • [28] A big data MapReduce framework for fault diagnosis in cloud-based manufacturing
    Kumar, Ajay
    Shankar, Ravi
    Choudhary, Alok
    Thakur, Lakshman S.
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2016, 54 (23) : 7060 - 7073
  • [29] An Efficient and Energy-Aware Cloud Consolidation Algorithm for Multimedia Big Data Applications
    Lim, JongBeom
    Yu, HeonChang
    Gil, Joon-Min
    SYMMETRY-BASEL, 2017, 9 (09):
  • [30] A Big Data Prediction Framework for Weather Forecast Using MapReduce Algorithm
    Adam, Khalid
    Majid, Mazlina Abdul
    Fakherldin, Mohammed Adam Ibrahim
    Zain, Jasni Mohamed
    ADVANCED SCIENCE LETTERS, 2017, 23 (11) : 11138 - 11143