ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Quality Assurance in the Linkage of Survey Data: Frameworks, Tools, and Best Practices

Session Organisers Dr Jessica Daikeler (GESIS- Leibniz Institute for the Social Sciences )
Anne Stroppe (GESIS- Leibniz Institute for the Social Sciences )
Laura Young (University of Mannheim)
TimeWednesday 16 July, 11:00 - 12:00
Room Ruppert 011

As survey research increasingly incorporates diverse data sources, ensuring the quality and reliability of linked survey data is essential. Linked survey data, often combining traditional survey responses with external data sources like administrative records, sensor data, or social media data presents unique challenges due to the complexity of integration and the potential for discrepancies. These challenges in the linkage process necessitate robust frameworks and tools to manage, validate, and enhance data quality.
This session will focus on the key aspects of quality assurance in the collection and utilization of linked survey data. We will explore comprehensive frameworks, cutting-edge tools, and best practices specifically designed to maintain the integrity and usability of data from multiple sources when linked to survey responses. Key topics will include:
1. Frameworks for Quality Assurance: An overview of frameworks developed to assess the quality of data linkage.
2. Tools and Platforms for Data Validation: A discussion on tools and technologies aimed at validating the quality of linked survey data and the linkage process itself. This will include both automated and manual validation techniques and open-source platforms tailored to linked data validation, such as the KODAQS toolbox.
3. Best Practices and Case Studies: Guidelines for the collection and processing of linked survey data, focusing on strategies to assess and improve data quality during the linkage. Real-world case studies will demonstrate successful methods for linking external data sources with survey responses, addressing the specific challenges encountered and the solutions applied.
4. Didactics of Data Quality Issues: Approaches to teaching and promoting data quality assurance for linked survey data. This section will explore educational strategies to equip researchers and practitioners with the necessary skills to effectively tackle data quality issues.

Keywords: data linkage, data quality, tools, frameworks, best practice, use case

Papers

Consent to Data Linkage in the Ageing European Population

Ms Imke Herold (SHARE BERLIN Institute (SBI)) - Presenting Author
Ms Jessica Irving (SHARE BERLIN Institute (SBI))
Dr Yuri Pettinicchi (SHARE BERLIN Institute (SBI))
Dr Arne Bethmann (SHARE BERLIN Institute (SBI))

Population surveys increasingly seek to link their respondents’ answers to external databases, such as administrative health or pension records. For ethical and legal reasons, this requires informed consent, which can vary considerably between surveys, countries or domains of administrative data. Variations in consent rates might affect the usability and quality of linked data.
Our study explores cross-European variation in consent to data linkage using data from the Survey of Health, Ageing and Retirement in Europe (SHARE), which follows individuals aged 50 and older in 27 European countries and Israel. We analysed real-life consent rates for data linkage across 10 European countries and hypothetical consent preferences from a pilot study in 28 countries. While real-life consent questions reflect institutional settings specific to the country, hypothetical consent questions were standardised across all countries and covered four data domains: health, pensions, income and taxes, and employment.
Both, real-life and hypothetical consent rates varied considerably across countries. For the real-life consent, we found the highest consent rates in Denmark and the lowest in Italy. The hypothetical consent showed the highest rates in Belgium, Denmark and Estonia and the lowest in Italy, Hungary and Poland. Outside of a tendency of respondents from Nordic countries being more open to record linkage, spatial patterns in consent behaviour were limited. Some multilingual countries, such as Switzerland and Israel, displayed notable variability between language groups. Countries differed in which data domains they found acceptable to link to, although linkage to income and tax information was almost universally the least popular domain. Intriguingly, preliminary evidence suggests that actual consent rates are higher than stated preferences.
These findings highlight the need for further cross-national research to address country-specific challenges. They also provide valuable insights for surveys aiming to incorporate linkage.


Producing Linked Data Set with Correction Weights

Ms An-Chiao Liu (Utrecht University) - Presenting Author
Dr Peter Lugtig (Utrecht University)

Data linkage is increasingly important in a world where more and more data are available. Given the infrastructure of the data ecosystem, researchers may only be able to access linked data sets but not the whole information of the original data sources. However, the quality of the linked data is constrained by the original data sources. Without understanding the possible linkage error and selection bias in either of the two datasets that are used to produce the linked data set, naïvely treating the linked data set as a simple random sample may result in biased estimates.

In order to guarantee the quality of the analysis based on the linked data sets, one choice is to release correction unit weights alongside the linked data sets.The correction weights can combine information on the selection process from the individual datasets(s) in the form of design or analysis weights, with linkage error weights. Researchers may then apply designed-based estimators for subsequent analyses based on the correction weights. In this presentation, we discuss possible ways to produce the correction weights, we illustrate the method using a practical example, and show results of a simulation study that shows the circumstances in which the linkage correction weights work.