Claassen Group: Causality from single-cell data

Causality inference from single-cell data. Inferring cause and effect, instead of mere correlative relationships between biological variables is a challenge of interest in many biological applications. This project deals with analyzing intervention experiments with single cell readout to infer causal signaling relationships.

22.04.2015 by Pascal Kägi

Number of comments

Inferring cause and effect, instead of mere correlative relationships between biological variables is a challenge of interest in many biological applications. The use of interventions, where a change is made to the distribution of one or more variables, allows for such inferences to be made. For example, suppose A and B are correlated variables, such as the activity levels of a kinase and the phosphorylation levels of its targets. With only this information, we cannot say whether A causes B or B causes A. Suppose we force A to take particular values (i.e. we make an intervention). If B is then still correlated with A, then this is evidence that A causes B. However if the distribution of B remains unchanged, this is evidence that there is no causal link from A to B. Recent work in the field of causality [1-4] allows for inferences to be made across relatively large numbers of variables and interventions simultaneously. Our collaborator is developing an R package which implements several of these techniques together, to allow for easy choice between them.

Bodenmiller et al. [5] analyzed peripheral blood mononuclear cells in the presence of stimuli (such as GM-CSF) and small molecule inhibitors. This data is ideal for such an inference approach: high-throughput single-cell data provides sufficient data points for the inference, while the structure of the experiment includes many natural interventions.

For the project, the student would:

process the data to make it suitable for use in the R package;
review literature to decide the likely effect on the network of each intervention;
apply each applicable technique in the package to the data (assessing wether any assumptions made for each package are valid); and
analyze and compare the results.

Students with stronger theoretical backgrounds would also develop transfer learning methods to utilize the repeated structure of the interventions, and the expected commonality of signaling networks across cell types. The project would be purely computational, and would suit a student with a background in mathematics, statistics or machine learning. For further information, please contact Professor Manfred Claassen () or Will Macnair ().

[1] Hauser, A., & Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. The Journal of Machine Learning Research, 13(1), 2409-2464.

[2] Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research, 7, 2003-2030

[3] M. Kalisch and P. Buehlmann (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. The Journal of Machine Learning Research, 8, 613-636

[4] M. Kalisch, M. Maechler, D. Colombo, M.H. Maathuis and P. Buehlmann (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software 47(11) 1–26

[5] Bodenmiller, B., Zunder, E. R., Finck, R., Chen, T. J., Savig, E. S., Bruggner, R. V., ... & Nolan, G. P. (2012). Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nature biotechnology, 30(9), 858-867.