Using new technologies to enhance the value of qualitative data in longitudinal studies: an application to health and wellbeing, and ageing

Qualitative data, such as essays and free response questions in surveys, are rich sources of psychological, social and behavioural information.

Yet such information has traditionally been impossible to leverage at a large scale.

Recent advances in computational linguistics and machine learning have produced automatic content analysis tools, which have started to be used in a variety of settings including in text used in social media settings such as Facebook and Twitter.

We will apply these for the first time to the open responses collected longitudinally within a large national birth cohort study.

The data include self-reported essays, written at age 11 and age 50, in response to the following questions:

  • At age 11: "Imagine you are now 25 years old. Write about the life you are leading, your interests, your home life and your work at the age of 25"
  • At age 50: "Imagine that you are now 60 years old...please write a few lines about the life you are leading (your interests, your home life, your health and wellbeing and any work you may be doing)".

The responses (13,669 at age 11; 7,383 at age 50) provide a largely untapped source of psychological and behavioural information that can be linked longitudinally to outcomes for the same individuals.

The project involves three major steps. The first step will be to digitally transcribe 13,000 age 11 essays contained within the National Child Development Study (NCDS).

Automatic content analysis tools will be applied to the transcribed essays in order to undertake quantitative analysis of the words and concepts expressed in essays at age 11 and 50.

These will then be related to their health and well-being, occupational choices and family life in their adult lives. The newly transcribed essays will be made available in anonymised form at the UK Data Service in 2017.

CLS contact

Alissa Goodman,

Principal Investigator, National Child Development Study

Alissa is an economist whose main research interests relate to inequality, poverty, education policy, and the intergenerational transmission of health and well-being. Email Alissa

Project team

Alissa Goodman, CLS

Andrew Schwartz, Stonybrook University

Peggy Kern, University of Melbourne

JD Carpentieri, UCL Institute of Education


February 2016 - July 2017