For many diseases, early and accurate diagnosis is beneficial for the selection of optimized, specific and timely management decisions. The diagnostic process requires certain measurables indicating a specific biological or clinical state, so-called biomarkers. Different types of biomarkers are being tested including RNA, DNA, metabolites and proteins. Among these, proteins are more dynamic, diverse and more directly reflective of cellular physiology than nucleic acid based markers, offering high potential to serve as biomarkers for routine application. We are working on a platform for the purpose of protein biomarker discovery, where we combine specific sample preparation approaches, different mass spectrometry methods, and the analysis by computational tools.
Glycosylated proteins (glycoproteins) represent a subproteome that is particularly relevant for clinical research because they are usually found either secreted by tissues, thus representing good candidates for detection in easily accessible bodily fluids, or at the cell surface representing potential drug targets. Nearly 80% of the currently used protein biomarkers and drug targets in the clinics are glycosylated. In Aebersold lab, we are using isolation techniques for N-glycosites, such as solid phase extraction of N-linked glycopeptides (SPEG) method, to isolate the glycoproteomes from clinical plasma or tissue samples, for cancer biomarker discovery studies. Diseases of the current interest include prostate cancer, ovarian cancer, pancreatic cancer, colorectal cancer etc.
Various mass spectrometry technologies are being used for biomarker discovery, such as shotgun proteomics, SRM and SWATH-MS. We recently developed the most extensive, SWATH-ready glycopeptide spectral library available to date. It consists of 5422 definitive N-glycosites assays that identify approximately 2500 glycoproteins, covering ~50% of the annotated human N-glycoproteome, as an important community resource for perpetual use. Importantly we are now working on establishment of a novel, iterative biomarker discovery pipeline based on SWATH-MS technique.
Population-scale proteomic analysis is essential for biomarker studies and, more broadly, for personalized and precision medicine. A barcode is the representation of data that can be used to rapidly identifying a unique object. Biological samples from different species are being catalogued using DNA barcodes, however, DNA is not the ideal material to distinguish samples from the same species.
We introduce here the concept “proteome barcoding” as a new mass spectrometry based methodology for producing representative proteomic BIG DATA for population-scale biological samples. The methodology has 3 key features: 1) fast digitization of proteome; 2) minimal sample consumption and, 3) reproducible and comprehensive BIG proteomic DATA.
To implement “proteome barcoding”, we developed a method for the fast mass spectrometric conversion of small amounts of sample into a single, permanent digital file representing the quantitative proteome of the sample. The thus generated proteome maps can then be perpetually analyzed, compared and mined in silico. The method combines pressure cycling technology (PCT) and SWATH mass spectrometry. The resulting data were analyzed using software tools including OpenSWATH.
We are developing variant versions of PCT-SWATH for digitizing subproteomes including phosphoproteome and membrane proteome, and applying these methods to digitize large numbers of clinical samples and human cells in order to discover and validate protein biomarkers.