Three-day course

Statistical methods for risk prediction and prognostic models

Date: Postponed until 2021
Location: Dorothy Hodgkin Building, Keele University, ST5 5BG
Cost: £550 (Student), £695 (Public Sector), £895 (Commercial/Industry). Fees include course registration, a pub meal on the first evening and dinner at Keele Hall on the second evening. Please note accommodation is not included for this course (see e-store contact information for nearby B&Bs)
Tutors: Dr Kym Snell (Keele University), Professor Richard Riley (Keele University), Dr Joie Ensor (Keele University), Professor Gary Collins (University of Oxford), Dr Laura Bonnett (University of Liverpool)

This short course will be run virtually in 2021. This page will be updated with details when they are available.

This is a 3-day course which provides a thorough foundation of statistical methods for developing and/or validating prognostic models in clinical research. Through a mixture of lecture and practical sessions using Stata and R, you will develop an understanding and appreciation of the underlying statistical concepts which can apply the methods learned to real datasets containing either binary or time-to-event outcomes.

There is a growing interest in risk prediction and prognostic models, as they allow us to better estimate future risk ('prognosis'). Patients and their carers often want to know what the risk of developing an adverse health outcome might be over time, and prognostic models allow patients and their families to put a clinical diagnosis in context, helping to make inform clinical decisions and treatment strategies.

The course is aimed at individuals that want to learn how to develop and validate risk prediction and prognostic models, specifically for binary or time-to-event clinical outcomes. We recommend participants have a background in statistics. An understanding of key statistical principles and measures (such as effect estimates, confidence intervals and p-values) and the ability to apply and interpret regression models is essential. We also recommend that participants are familiar with Stata or R, although the practicals will not require individuals to write their own code.

Participants will need to bring a laptop with R or Stata version 12 or above installed. It may be possible to borrow a laptop on the day, but this must be agreed in advance.

The course is delivered over 3 days and focuses on model development (day 1), internal validation (day 2), and external validation and novel topics (day 3). Our focus is on multivariable models for individualised prediction of future outcomes (prognosis), although many of the concepts described also apply to models for predicting existing disease (diagnosis).

Day 1 begins with an overview of the rationale and phases of prediction model research. It then outlines model specification, focusing on logistic regression for binary outcomes and Cox regression or flexible parametric survival models for time to event outcomes. Model development topics are then covered, including: identifying candidate predictors, handling of missing data, modelling continuous predictors using fractional polynomials or restricted cubic splines for non-linear functions, and variable selection procedures.

Day 2 focuses on how models are optimised for the data in which they were derived, and thus often do not generalise to other datasets. Internal validation strategies are outlined to identify and adjust for overfitting. In particular bootstrapping is covered to estimate the optimism and shrink the model coefficients accordingly; related approaches such as Lasso are also discussed. Statistical measures of model performance are introduced for discrimination (such as the C-statistic and D-statistic) and calibration (calibration-in-the-large, calibration plots, calibration slope). In the afternoon, we discuss the importance of the TRIPOD reporting guideline when publishing prediction model research, and finish with a session on sample size considerations for model development and validation.

Day 3 focuses on the need for model performance to be evaluated in new data to assess its generalisability, namely external validation. A framework for different types of external validation studies is provided, and the potential importance of model updating strategies (such as re-calibration techniques) are considered. Novel topics are then considered, including: the development and validation of models using large datasets (e.g. from e-health records) or multiple studies; the use of meta-analysis methods for summarising the performance of models across multiple studies or clusters; the role of net benefit and decision curve analysis to understand the potential role of a model for clinical decision making. We conclude with practical guidance about different ways in which prediction and prognostic models can be presented.

Computer practicals are included on all three days, and participants can choose whether to focus on logistic regression examples (for binary outcomes) or Cox / flexible parametric survival examples (for time-to-event outcomes), to tailor the practicals to their own purpose.