Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

被引:0
|
作者
Bob, Konstantin [1 ]
Teschner, David [1 ]
Kemmer, Thomas [1 ]
Gomez-Zepeda, David [2 ,3 ]
Tenzer, Stefan [2 ,3 ]
Schmidt, Bertil [1 ]
Hildebrandt, Andreas [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Inst Comp Sci, D-55128 Mainz, Germany
[2] Johannes Gutenberg Univ Mainz, Inst Immunol, Univ Med Ctr, D-55128 Mainz, Germany
[3] Helmholtz Inst Translat Oncol HITRON Mainz, Immunoprote Unit, D-55131 Mainz, Germany
关键词
Mass spectrometry; Locality-sensitive hashing; Signal processing; PEPTIDE IDENTIFICATION; PROTEOMICS; RANGE;
D O I
10.1186/s12859-022-04833-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. Results: In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Conclusions: Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Availability: Generated data and code are available at https://github.com/hildebrand tlab/mzBucket. Raw data is available at https://zenodo.org/record/5036526.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
    Konstantin Bob
    David Teschner
    Thomas Kemmer
    David Gomez-Zepeda
    Stefan Tenzer
    Bertil Schmidt
    Andreas Hildebrandt
    [J]. BMC Bioinformatics, 23
  • [2] Efficient locality-sensitive hashing over high-dimensional streaming data
    Wang, Hao
    Yang, Chengcheng
    Zhang, Xiangliang
    Gao, Xin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (05): : 3753 - 3766
  • [3] Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams
    Yang, Chengcheng
    Deng, Dong
    Shang, Shuo
    Shao, Ling
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1994 - 1997
  • [4] Efficient locality-sensitive hashing over high-dimensional streaming data
    Hao Wang
    Chengcheng Yang
    Xiangliang Zhang
    Xin Gao
    [J]. Neural Computing and Applications, 2023, 35 : 3753 - 3766
  • [5] Using Locality-Sensitive Hashing for SVM Classification of Large Data Sets
    Gonzalez-Lima, Maria D.
    Ludena, Carenne C.
    [J]. MATHEMATICS, 2022, 10 (11)
  • [6] Efficient Data Stream Clustering with Sliding Windows based on Locality-Sensitive Hashing
    Youn, Jonghem
    Shim, Junho
    Lee, Sang-Goo
    [J]. IEEE ACCESS, 2018, 6 : 63757 - 63776
  • [7] GLDH: Toward more efficient global low-density locality-sensitive hashing for high dimensions
    Li, Yiqi
    Xiao, Ruliang
    Wei, Xin
    Liu, Huakun
    Zhang, Shi
    Du, Xin
    [J]. INFORMATION SCIENCES, 2020, 533 : 43 - 59
  • [8] Interpretation of mass spectrometry data for high-throughput proteomics
    Chamrad, DC
    Koerting, G
    Gobom, J
    Thiele, H
    Klose, J
    Meyer, HE
    Blueggel, M
    [J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2003, 376 (07) : 1014 - 1022
  • [9] Interpretation of mass spectrometry data for high-throughput proteomics
    Daniel C. Chamrad
    Gerhard Koerting
    Johan Gobom
    Herbert Thiele
    Joachim Klose
    Helmut E. Meyer
    Martin Blueggel
    [J]. Analytical and Bioanalytical Chemistry, 2003, 376 : 1014 - 1022
  • [10] Advanced nanoscale separations and mass spectrometry for sensitive high-throughput proteomics
    Shen, YF
    Smith, RD
    [J]. EXPERT REVIEW OF PROTEOMICS, 2005, 2 (03) : 431 - 447