Analyzing large-scale proteomics projects with latent semantic indexing

被引:29
|
作者
Klie, Sebastian [1 ]
Martens, Lennart [2 ]
Vizcaino, Juan Antonio [2 ]
Cote, Richard [2 ]
Jones, Phil [2 ]
Apweiler, Rolf [2 ]
Hinneburg, Alexander [1 ]
Hermjakob, Henning [2 ]
机构
[1] Univ Halle Wittenberg, Halle An Der Saale, Germany
[2] European Bioinformat Inst, EMBL Outstn, Cambridge, England
关键词
bioinformatics; data mining; proteomics; latent semantic analysis;
D O I
10.1021/pr070461k
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analyses have been performed and published on this data, leveling off the ultimate value of these projects far below their potential. A prominent reason published proteomics data is seldom reanalyzed lies in the heterogeneous nature of the original sample collection and the subsequent data recording and processing. To illustrate that at least part of this heterogeneity can be compensated for, we here apply a latent semantic analysis to the data contributed by the Human Proteome Organization's Plasma Proteome Project (HUPO PPP). Interestingly, despite the broad spectrum of instruments and methodologies applied in the HUPO PPP, our analysis reveals several obvious patterns that can be used to formulate concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in future experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data by noise-tolerant algorithms such as the latent semantic analysis holds great promise and is currently underexploited.
引用
收藏
页码:182 / 191
页数:10
相关论文
共 50 条
  • [1] Large-scale information retrieval with latent semantic indexing
    Letsche, TA
    Berry, MW
    [J]. INFORMATION SCIENCES, 1997, 100 (1-4) : 105 - 137
  • [2] A Fast Approximate Algorithm for Large-Scale Latent Semantic Indexing
    Zhang, Dell
    Zhu, Zheng
    [J]. 2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 639 - 644
  • [3] Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
    Wang, Quan
    Xu, Jun
    Li, Hang
    Craswell, Nick
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (01)
  • [4] Large-scale latent semantic analysis
    Olney, Andrew McGregor
    [J]. BEHAVIOR RESEARCH METHODS, 2011, 43 (02) : 414 - 423
  • [5] Large-scale latent semantic analysis
    Andrew McGregor Olney
    [J]. Behavior Research Methods, 2011, 43 : 414 - 423
  • [6] Semantic overlay network for large-scale spatial information indexing
    Zou, Zhiqiang
    Wang, Yue
    Cao, Kai
    Qu, Tianshan
    Wang, Zhongmin
    [J]. COMPUTERS & GEOSCIENCES, 2013, 57 : 208 - 217
  • [7] High Throughput Indexing for Large-scale Semantic Web Data
    Cheng, Long
    Kotoulas, Spyros
    Ward, Tomas E.
    Theodoropoulos, Georgios
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 416 - 422
  • [8] Analyzing Terror Attacks using Latent Semantic Indexing
    Toure, Ibrahim
    Gangopadhyay, Aryya
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGIES FOR HOMELAND SECURITY (HST), 2013, : 334 - 337
  • [9] BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
    Balikas, Georgios
    Krithara, Anastasia
    Partalas, Ioannis
    Paliouras, George
    [J]. MULTIMODAL RETRIEVAL IN THE MEDICAL DOMAIN, MRMD 2015, 2015, 9059 : 26 - 39
  • [10] DeepMeSH: deep semantic representation for improving large-scale MeSH indexing
    Peng, Shengwen
    You, Ronghui
    Wang, Hongning
    Zhai, Chengxiang
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    [J]. BIOINFORMATICS, 2016, 32 (12) : 70 - 79