The last 25 years have produced a revolution in statistical and computational tools for causal inference and discovery in biomedicine. In this tutorial, we will focus on causal discovery in large data sets drawn from clinical and translational research.

We will explain the basics of graphical causal models using multiple examples from biomedicine, clinical research, and other fields. We will teach the rudiments of graphical causal models and several search algorithms for learning about causal structure from background knowledge and data. We will use the freely available Tetrad program to teach these ideas with hands-on exercises using simulated and real data sets. We will cover how to represent and model causal systems, and the assumptions needed to connect causal hypotheses to observable constraints that make causal discovery possible. We will discuss why multiple-regression and related techniques are unreliable for causal discovery and demonstrate superior alternatives. We will discuss the problem of causal discovery in the presence of unmeasured confounders (latent variables) and present algorithms that can reliably extract causal information even when the measured variables fail to include hidden common causes. We will spend the final third of the tutorial on analyzing cancer data and electronic health record (EHR) data for causal relationships.

Learning Objective 1: Develop basic literacy in graphical causal models
Develop a basic understanding of causal search algorithms
Develop a working knowledge of Tetrad
Develop an elementary understanding of two diverse applications of causal discovery in biomedical and clinical research:
- Cancer data analysis (The Cancer Genome Atlas)
- EHR data analysis


Richard Scheines (Presenter)
Carnegie Mellon University

Gregory Cooper (Presenter)
University of Pittsburgh

Presentation Materials: