Three-day course

Statistical methods for risk prediction and prognostic models

Date: 9-11 November 2021
Location: online
Cost: £399 (Student), £499 (Academic/Public sector), £599 (Commercial/Industry).
Tutors: Dr Kym Snell (Keele University), Professor Richard Riley (Keele University), Dr Joie Ensor (Keele University), Lucinda Archer (Keele University), Professor Gary Collins (University of Oxford), Dr Laura Bonnett (University of Liverpool)
Registration: Please note that registration will close on Sunday 31 October or earlier if all available places are filled.

There is a growing interest in risk prediction and prognostic models, as they allow us to better estimate future risk ('prognosis'). Patients and their carers often want to know what the risk of developing an adverse health outcome might be over time, and prognostic models allow patients and their families to put a clinical diagnosis in context, helping to make inform clinical decisions and treatment strategies.

This 3-day online course provides a thorough foundation of the statistical methods most commonly needed to develop and/or validate prognostic and prediction models in clinical research. Through a mixture of recorded lectures, computer practical exercises in Stata and R, and live question and answer sessions, you will develop an understanding and appreciation of the underlying statistical concepts which can apply the methods learned to real datasets containing either binary or time-to-event outcomes.

The course is aimed at individuals that want to learn how to develop and validate risk prediction and prognostic models, specifically for binary or time-to-event clinical outcomes. We recommend participants have a background in statistics. An understanding of key statistical principles and measures (such as effect estimates, confidence intervals and p-values) and the ability to apply and interpret regression models is essential. We also recommend that participants are familiar with Stata or R, although the practical exercises will not require individuals to write their own code.

Participants will need R or Stata version 12 or above installed.

The course is intended to be completed over 3 days and focuses on model development (day 1), internal validation (day 2), and external validation and novel topics (day 3). Our focus is on multivariable models for individualised prediction of future outcomes (prognosis), although many of the concepts described also apply to models for predicting existing disease (diagnosis).

Participants are encouraged to watch a pre-course video prior to the week of the course. This introductory video provides an overview of the rationale and phases of prediction model research, and will provide a platform for the subsequent course content.

Day 1 covers key topics for model development including: identifying candidate predictors, handling of missing data, modelling continuous predictors using fractional polynomials or restricted cubic splines for non-linear functions, and variable selection procedures.

Day 2 focuses on how models are optimised for the data in which they were derived, and thus often do not generalise to other datasets. Internal validation strategies are outlined to identify and adjust for overfitting. In particular, cross validation and bootstrapping are covered to estimate the optimism and shrink the model coefficients accordingly post-estimation. Other penalisation approaches such as Lasso, ridge regression and elastic net are also detailed. Statistical measures of model performance are introduced for discrimination (such as the C-statistic and D-statistic) and calibration (calibration-in-the-large, calibration plots, calibration slope and curves). Further sessions cover sample size considerations for model development and validation, with emphasis on new approaches that are better than traditional rules of thumb.

Day 3 focuses on the need for model performance to be evaluated in new data to assess its generalisability, namely external validation. A framework for different types of external validation studies is provided, and the potential importance of model updating strategies (such as re-calibration techniques) are considered. Novel topics are then considered, including: the development and validation of models using large datasets (e.g. from e-health records) or multiple studies; the use of meta-analysis methods for summarising the performance of models across multiple studies or clusters; the use of net benefit and decision curve analysis to understand the potential role of a model for clinical decision making, and different formats for presenting prediction models. The final session of the day discusses the importance of the TRIPOD reporting guideline when publishing prediction model research.

Stata and R practical exercises are included on all three days, and participants will be able to choose whether to focus on logistic regression examples (for binary outcomes) or Cox / flexible parametric survival examples (for time-to-event outcomes), to tailor these exercises to their own purpose.

Please note that registration will close on Sunday 31 October or earlier if all available places are filled.

For queries related to registration, please contact the events team at

For queries related to the course content, please contact Dr Kym Snell