Linking cohort study data to administrative records: The challenges of consent and coverage

About the project

Cohort and other longitudinal studies can be enriched by linking the study participants’ data to their administrative records for education, use of health services, national insurance contributions, tax payments and welfare benefits. This type of data linkage can provide a wealth of information at a limited extra cost.

However, efforts to link data are met by three problems:

  1. Non-consent: Participants may refuse to allow survey teams to access their administrative records and link them to survey data.
  2. Non-coverage: Even when participants give consent, it may not be possible to find them in the administrative register.
  3. Missing values: Even when a participant gives consent and can be found on the administrative register, their records could be missing information on certain variables.

These problems can cause biases when analysing linked data. Using linked data from the Millennium Cohort Study (MCS), this project examined patterns of non-consent and non-coverage, and identifies weighting and imputation techniques that can adjust for biases.

Consent and coverage in data linkage is an under-researched area of survey methodology. However, it has become increasingly urgent to find solutions to these challenges as the demand for linked data sets rises.

This project was funded by the National Centre for Research Methods and the Economic and Social Research Council. It ran from April 2013 to September 2014.

Key research objectives:

  1. To uncover patterns of consent and coverage by (i) domain (education, health, and tax-benefit records), (ii) person (parent, cohort member and siblings), and (iii) time (cohort sweep), and the associations between them
  2. To establish the magnitude of biases resulting from non-consent and non-coverage in data available for further analysis
  3. To construct – and make available to users – weights to adjust for non-coverage
  4. To investigate the impact of different weighting and imputation techniques on biases, in order to inform debate on how and under which conditions these should be used
       
CLS contact:

Tarek Mostafa,

Research Officer

Tarek is the Principal Investigator for the linked data project. His role at CLS focuses mainly on methodological research, in particular the treatment of non-response in longitudinal studies, the construction of non-response weights and imputation techniques. Email Tarek.

Other research team members:

Lucinda Platt, Co-Investigator

John Micklewright, Co-Investigator