High-Throughput Signal Detection in Proteomics Data Sets

Project Description

Mass spectrometry has become the de-facto standard for experimental proteomics, and is becoming increasingly important for metabolomics. But even today, with highly sophisticated mass spectrometers available, signal interpretation is a great challenge. Large-scale proteomics applications, for instance, routinely generate Terabytes of data that must be processed. And since signal-to-noise ratios can become very low, in particular for complex mixtures, only a small proportion of the theoretically available information can be extracted today: in a typical study on a relatively simple model organism (C. glutamicum), only 20,000 of 75,000 recorded MS/MS spectra could be successfully interpreted, and only 7,500 of the 95,000 peptides predicted for the organism could be uniquely identified.

In this project, we have worked on a wavelet-based signal processing strategy that simultaneously detects whole isotope patterns instead of isolated peaks, leading to very sensitive and stable feature detection. To enable routine application of the method, we have developed a vectorized algorithm for this approach that runs on modern graphics processing units, with a speed-up factor of approximately 200. (This work was also presented in the highlights track of the International Conference on Bioinformatics (InCob) 2009 in Singapore). Through Prof. Tholey (now at Kiel University), the project was able to generate its own gold-standard data sets, and to have access to experimental experts to compare automated signal processing results to manually generated expert opinion.

In a collaboration between the groups headed by Prof. Hein and Prof. Hildebrandt, the project has developed a further processing technique that is ideally suited to separate highly overlapping features (publication in preparation). The methods developed in this project have also been applied to metabolomics GC-MS data generated in Prof. Heinzle's group, where it led to greatly improved detection rates (publication in preparation). In an important external collaboration, the methods developed in this project are used at the Aebersold lab at the ETH Zürich. (publication in preparation).