ESRA 2025 Preliminary Glance Program
All time references are in CEST
Quality Assurance in the Linkage of Survey Data: Frameworks, Tools, and Best Practices 2 |
Session Organisers |
Dr Jessica Daikeler (GESIS- Leibniz Institute for the Social Sciences ) Anne Stroppe (GESIS- Leibniz Institute for the Social Sciences ) Laura Young (University of Mannheim)
|
Time | Wednesday 16 July, 14:00 - 15:00 |
Room |
Ruppert 011 |
As survey research increasingly incorporates diverse data sources, ensuring the quality and reliability of linked survey data is essential. Linked survey data, often combining traditional survey responses with external data sources like administrative records, sensor data, or social media data presents unique challenges due to the complexity of integration and the potential for discrepancies. These challenges in the linkage process necessitate robust frameworks and tools to manage, validate, and enhance data quality.
This session will focus on the key aspects of quality assurance in the collection and utilization of linked survey data. We will explore comprehensive frameworks, cutting-edge tools, and best practices specifically designed to maintain the integrity and usability of data from multiple sources when linked to survey responses. Key topics will include:
1. Frameworks for Quality Assurance: An overview of frameworks developed to assess the quality of data linkage.
2. Tools and Platforms for Data Validation: A discussion on tools and technologies aimed at validating the quality of linked survey data and the linkage process itself. This will include both automated and manual validation techniques and open-source platforms tailored to linked data validation, such as the KODAQS toolbox.
3. Best Practices and Case Studies: Guidelines for the collection and processing of linked survey data, focusing on strategies to assess and improve data quality during the linkage. Real-world case studies will demonstrate successful methods for linking external data sources with survey responses, addressing the specific challenges encountered and the solutions applied.
4. Didactics of Data Quality Issues: Approaches to teaching and promoting data quality assurance for linked survey data. This section will explore educational strategies to equip researchers and practitioners with the necessary skills to effectively tackle data quality issues.
Keywords: data linkage, data quality, tools, frameworks, best practice, use case
Papers
Use of Random and Systematic Error for the Evaluation of Harmonized Data
Dr Rabia Karatoprak Ersen (GESIS – Leibniz Institute for the Social Sciences) - Presenting Author
In an harmonization context, data from different surveys are combined. Validity of inferences based on harmonized data depends on the comparability of scores obtained from the data. Therefore, the methods used for the harmonization must be evaluated and evaluation results must be applied in harmonization to improve comparability and thus validity of scores. Test equating and linking methodologies, which are operationally used in educational measurement field, can be used for harmonization (e.g., Singh, 2022). Implementing equating or linking requires making decisions about data collection, operational definition of equating, statistical estimation methods, conducting equating, and evaluating the results of equating (Kolen & Brennan, 2014). Evaluating the results requires identification of criteria for equating. Furthermore, equating methods are statistical methods which contain random error and systematic error. How to minimize these errors is a main concern for improving comparability. Comparing equating results by setting up sound evaluation criteria is one way to find the equating method which minimizes these errors.
In this study, identity equating, mean equating, linear equating and equipercentile equating, which become linking methods in the harmonization context, were used for harmonization of a question administered in international survey programs. Evaluation design was constructed by the bootstrap method (Efron, 1982). Identity linking was the criterion equating relationship. Standard error of equating, which is an index of random error, absolute bias, which is an index of systematic error, and root mean squared error, which is an index of both errors, were used to evaluate relative performance of each equating method. All three of these indices showed the same pattern: more parsimonious methods produced less error. That is, mean linking performed better that linear linking and linear linking performed better than equipercentile linking.
Evaluating Data Quality in the Linked BCS70 - HES Dataset
Dr Thomas O'Toole (The University of Manchester) - Presenting Author
The integration of survey and administrative data can be used to complement existing data and provide additional detailed measures that may be hard to capture in a survey. However, while the extra information captured by administrative data may be valuable, linking such data can lead to new challenges regarding data quality and statistical inference. Furthermore, administrative data (such as Hospital Episode Statistics (HES)) can be linked with survey data to evaluate potential sources of error in the representational and measurement of the survey (Groves et al., 2004; Rajah et al., 2023).
In this paper we will evaluate data quality in the linked 1970's British Cohort Study (BCS70) and Hospital Episode Statistics dataset (Gomes, 2020), using a replicable data quality framework established by Silverwood et al. (2024). This framework includes an exploration of linkage rates, predictors of consent to linkage, and the examination of linkage representativeness comparing BCS70-HES data to population benchmarks. We will also explore an applied example of how such linked data can be used to answer social research questions.
Best practice of linking survey and administrative pension data. Experiences from 30 years of research in Germany.
Dr Christin Czaplicki (German Pension Insurance) - Presenting Author
Dr Thorsten Heien (German Pension Insurance)
The need for quickly and comprehensively available socio-economic information is constantly increasing. It is therefore not surprising that national and international stakeholders from politics and science are campaigning for the expansion of data, the simplification of data access and the legally regulated possibility of linking data across sectors. In particular, the linking of different data sources is an important instrument, through which, e.g., the scope of available information can be expanded, time and costs are reduced and the effort for participants is minimized.
The German Pension Insurance (Deutsche Rentenversicherung; DRV) has been using direct record linkage (RL) by means of a unique identification number (social security number (SSN)) in various research projects since the 1990s. The aim of this contribution is firstly to demonstrate the step-by-step selection process of RL in detail based on the DRV's experience and to highlight possible sources of error. Subsequently, we compare various projects (AVID, LeA, SHARE-RV, SOEP-RV) using key figures on the selection process: 1. survey response rate, 2. consent rate for the RL, 3. validation of information on SSN, 4. finding and extracting the process data from the administrative data pool and 6. linkage rate. This comparison serves to make the RL process transparent and to derive best practice solutions for minimizing errors and resulting biases during the RL. This is particularly relevant since in many countries, including Germany, data linking is already carried out centrally via data trustees or is at least planned. Thus, the results of this contribution can help to highlight and to raise awareness of the pitfalls of this form of RL.