ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Survey Data Integration: Nonprobability Surveys, Administrative and Digital Trace Data 2

Session Organisers Dr Camilla Salvatore (Utrecht University)
Dr Angelo Moretti (Utrecht University)
TimeTuesday 15 July, 11:00 - 12:30
Room Ruppert Wit - 0.52

Given the declining response rates and increasing costs associated with traditional probability-based sample surveys, researchers and survey organisations are increasingly investigating the use of alternative data sources, such as nonprobability sample surveys, administrative data and digital trace data.
While initially considered as potential replacements, it is now clear that the most promising role for these alternative data sources is supplementing probability-based sample surveys. Indeed, the use of auxiliary data is a considerable opportunity as it often allows for timeliness, data with detailed time intervals, and geographical granularity, among others.
This session will discuss cutting-edge methodologies and innovative case studies that integrate diverse data sources in survey research. It will highlight strategies for improving inference, assessing data quality, and addressing biases (e.g. selection and measurement). Attendees will gain insights into the latest advancements in data integration techniques, practical applications, and future directions for survey research in an increasingly complex data environment.

Keywords: data integration, online surveys, digital trace, selection bias

Papers

Using linked cohort data to help address residual confounding in analyses of population administrative data

Dr Richard Silverwood (Centre for Longitudinal Studies, University College London) - Presenting Author
Dr Gergo Baranyi (Centre for Longitudinal Studies, University College London)
Professor Lisa Calderwood (Centre for Longitudinal Studies, University College London)
Professor Bianca De Stavola (Population, Policy & Practice Department, UCL Great Ormond Street Institute of Child Health, University College London)
Professor George Ploubidis (Centre for Longitudinal Studies, University College London)
Professor Ian White (3MRC Clinical Trials Unit at UCL, University College London)
Professor Katie Harron (Population, Policy & Practice Department, UCL Great Ormond Street Institute of Child Health, University College London)

Analyses of population administrative data can often only be minimally adjusted due to the unavailability of a full set of control variables, leading to bias due to residual confounding. Cohort studies will often contain rich information on potential confounders but may not be sufficiently powered to meaningfully address the research question of interest. We aimed to use linked cohort data to help address residual confounding in analyses of population administrative data.

We propose a multiple imputation-based approach, introduced through application to simulated data in three different scenarios related to the structure of the datasets. We then apply this approach to a real-world example – examining the association between pupil mobility (changing schools at non-standard times) and Key Stage 2 (age 11) attainment using data from the UK National Pupil Database (NPD). The limited control variables available in the NPD are supplemented by multiple measures of socioeconomic deprivation captured in linked Millennium Cohort Study (MCS) data.

The proposed approach is observed to perform well when using simulated data across the different scenarios. The association between pupil mobility and Key Stage 2 attainment was attenuated after supplementing the NPD analysis with information from linked MCS data, though with a decrease in precision.

We have demonstrated the potential of the proposed approach, but more work is required to understand whether and how it can be applied more broadly. The principles underlying this innovative approach are widely applicable: any analysis of administrative data where confounder control is limited by the availability of information could potentially be strengthened by linking a subset of individuals into richer cohort data and leveraging the additional information to inform population-level analyses.


Integration During the Entire Survey Process: The Usage of Survey and Administrative Data in the Austrian Student Social Survey.

Mrs Vlasta Zucha (Institute for Advanced Studies) - Presenting Author

Survey research can be supplemented and improved by using alternative data sources such as administrative data. Besides linking data at the individual level, great benefits can be gained from close linkage with administrative data across the entire research process.

The Austrian Student Social Survey (ASSS) is used to demonstrate how versatile administrative data can be utilized in survey research even without data linkage at the level of individuals. The innovative case study shows that administrative data can, for example, support the construction of the questionnaire, help with the preparation of fieldwork and support data processing. Aggregated information from administrative data is also used for weighting and plausibility checks during data cleaning. Survey and register data do not only interact during data collection and data preparation. Furthermore, survey data is supplemented on aggregate level with information from administrative data. Finally, the two data sources are also combined in the analysis. Both, similar and different content is analysed and published in combined reports (Zucha et al. 2024; Haag et al. 2024).

The inclusion of administrative data in survey research can be challenging. Differences in data structures and conceptual differences using similar variables pose challenges for data harmonisation, integration, and interpretation. To ensure that the usage of different data sources is appropriate and successful, specialised teams work together on the ASSS.

The conference contribution will provide insights into the benefits, but also the challenges of including administrative data in survey research, even if there is no possibility for linking the data on individual level.


Administrative Data Collection for the Social Sciences

Mrs Lisa Ziemba (Statistics Austria) - Presenting Author

Using administrative data as supplementary sources promises great analysis potential, as it provides researchers with a lot of highly reliable information on individuals, which are rarely available in traditional survey data. However, acquiring and subsequently working with administrative data is a demanding task, because their documentation mainly serves purposes within the European Statistical System and is not aimed at researchers, as it assumes critical prior knowledge. Especially international researchers face many barriers, such as language or knowledge of certain laws. This creates the need to provide a documentation that is tailored to the target audience to effectively communicate the particularities of the data at hand including the data collection and processing.
The Administrative Data Collection for the Social Sciences (ADCOL) is meant to play a pivotal role in supporting administrative data based social scientific research in Austria. It is made up of approximately 100 variables sourced from registries at the federal statistical office of Austria from six areas of life: family & demographics, housing, health, labor, income, and education. It features a research-friendly documentation describing the data products at Statistics Austria, from which the variables were selected.
In the presentation we share the research potential of the ADCOL using different use case examples. These potentials include the use of ADCOL as a register-based socio-economic household panel, as tracing individuals with the same main residence in a longitudinal manner is possible. Further, the data allows for longitudinal and multi-level analysis, as it provides information on individuals and households starting from 2015. Lastly, it is possible to link ADCOL data to other administrative data or surveys conducted in Austria.
Our presentation illustrates a best practice example of facilitating and documenting administrative data for their scientific use. The ADCOL provides a valuable data source for research to supplement traditional surveys.


What’s New in Data Integration? A Systematic Review and Data Typology.

Dr Thomas O'Toole (The University of Manchester) - Presenting Author

The integration of survey and non-survey data (e.g. administrative records, geospatial characteristics and digital trace data) can provide researchers with access to a breadth of rich and detailed information for use in applied and methodological fields. However, the extent, methods and aims for which various sources of non-survey data are integrated with survey data remains unclear.
This systematic review identifies the types and characteristics of commonly integrated survey and non-survey data sets, in addition to their integration purpose and methodology. Literature searches were conducted for peer-reviewed and pre-print articles concerning the use of data integration/linkage using Ovid (including the Cochrane Library, APAPsycInfo, Embase, Econlit and Medline), Web of Science (Core Collection), Scopus and Google Scholar (for grey literature). We also conducted snowball searches and accessed existing collections of publications from CLOSER and the UK Longitudinal Linkage Collaboration.
Results were used to construct a typology of integrated data sources available to researchers, covering survey and non-survey data types (and where to access them), and the purpose and methodology of the integration, including linkage level and consent. We further discuss the current data integration landscape and identify gaps for future exploration in data integration literature.