Aebersold lab: Master Thesis in Population Genetics—Analyzing Proteomics in a Diverse Population to Uncover Mechanisms of Aging
The process of aging can be better understood from a systems biology approach—examining how entire cellular pathways change across time. This project involves generating then analyzing transcriptomic, proteomic, and metabolomic data from liver tissue of 660 individual mice from different ages, which also segregate by genetics and diet. Project keywords: genetics, aging, multi-omics, metabolism, bioinformatics, statistics, systems biology, big data, personalized medicine.
Nearly 5 years ago, we initiated an aging study in a genetically-diverse collection of ~80 inbred mouse strains. Each strain is comprised of 20 genetically-identical sisters which are segregated into two dietary cohorts: a low-fat "chow" diet (6% calories from fat), and a high fat diet (60% calories from fat). For 12 of these individuals, we monitored their body weight over the full course of their natural lifespan, which ranged from around 1 year to 2.5 years, depending on the genetic background and the diet. The remaining 8 individuals were harvested at up to four different time points—6, 12, 18, and if possible 24 months of age—for which we will perform detailed analyses of different tissues to observe how tissues changes over time at the molecular level. This project has two stages: (1) a molecular stage of 3-4 months, and (2) a bioinformatic stage of "X" months (until the end of your thesis—this is a minimum of 3 months, but analysis will continue for years due to the wealth of data generated).
In January, we will start to analyze tissues from the ~660 collected individuals, starting with the liver. We will process samples for proteomics and metabolomics (in collaboration with the laboratory of Nicola Zamboni downstairs), while collaborative groups in Lausanne and Memphis, Tennessee will prepare transcriptomics. Proteomics data takes a long time to run (roughly 90 minutes per sample), and thus we will run it in distinct batches, which we will analyze and check for data quality as the samples are generated. By April, we will expect to have all "layers" of data (transcriptome, proteome, metabolome), at which point we will move primarily to the bioinformatic stage of the project. For reference, this will consist of roughly 17 million data points (~660 individuals * 20,000 transcripts + 660 * 4,000 proteins + 660 * 1500 metabolites). We will start with genetic mapping via QTL analysis—that is, causally linking the known, ~5 million natural genetic variants across the 80 different strains with expression differences of transcripts, proteins, and metabolites. We will then examine pathways of aging—which sets of genes and metabolites are differentially regulated with age? Which ones do we think are causing aging, and which ones do we think are the consequence of aging? We will then formulate hypotheses that will be tested in collaborative laboratories which focus on molecular pathway analysis. A paper on this project is expected to be written and submitted by early fall 2017, as the preliminary culmination of a six year project.
The study start time is flexible—any time between January 3 and March 3 is ideal, though the earlier the better. Starting after April is possible for a student who wants to work exclusively on a bioinformatics project, but this means you will miss the data generation side of the story. A basic background in R (or any basic scripting language), Adobe Illustrator (or any vector graphics program), Excel, and any "omics" technology (e.g. sequencing or mass spectrometry omics) is useful, though not required. One or two students may be selected for this project—particularly if one applicant is strong in wetlab biology and another applicant is strong in bioinformatics and computer science.
If interested, please send your CV and a half page motivation letter to Dr. Evan Williams.
For further background information, please check the following two papers:
(1) A review on systems biology and population analysis — this is essential background knowledge for the study, but we will not directly do any research along these lines:
(2) Two similar studies on multi-omics analysis for the generation and testing of novel hypotheses: