ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Sharing routes for individual-level research data and code

Session Organisers Dr Aida Sanchez-Galvez (Centre for Longitudinal Studies, UCL Social Research Institute )
Ms Cristina Magder (UK Data Service, UK Data Archive, University of Essex)
TimeTuesday 15 July, 13:45 - 15:00
Room Ruppert 011

Sharing individual-level research data is a core activity of many research projects, which often collect and manage a variety of data types, such as survey data, biomarkers, complex sensor data, genomics, linked administrative data, and geographical data. Some researchers are also generating synthetic data for teaching purposes and preliminary development of analysis code. Each of these data types presents its own challenges when it comes to sharing and dissemination for future research purposes. Additionally, the sharing of programming code is fundamental in ensuring reproducibility and transparency.
Data releases are internally managed by the studies themselves, and/or externally by national archives or Trusted Research Environments. Balancing the wide sharing of detailed research data with the need to maintain confidentiality and security, while also ensuring easy and swift access without significant barriers or delays, is a complex challenge. This balance becomes further challenging when dealing with sensitive and/or potentially disclosive data. Data that fall under the GDPR definition of “special category data” require additional protection and a higher degree of security and governance measures often involving Data Access Committees oversight and dedicated legal and sharing frameworks.
The aim of this session is to provide a platform for colleagues to discuss their experience and approaches to sharing individual-level research data, sensitive and non-sensitive, original or synthetic. Participants are encouraged to share their techniques to assess and manage disclosure risk, and best practices and challenges of code sharing. We invite colleagues to submit ideas relating to, but not restricted to:
- Sharing routes for individual-level research data
- Publication of programming code or syntax
- Management and sharing of synthetic data
- Methods of risk assessment of disclosivity and sensitivity
- Research data classification or data tiers
- Technical tools used to generate bespoke datasets
- Data access via Trusted Research Environments
- International data sharing

Keywords: data, sharing, disclosure, sensitive, code, synthetic

Papers

Sixty years of social science data sharing: legal and ethical frameworks in practice at the UK Data Archive

Mrs Susan Cadogan (UK Data Service, UK Data Archive, University of Essex) - Presenting Author

As the UK Data Archive, the lead partner of the UK Data Service, approaches its 60th anniversary, it stands as a proof of sustained innovation in data sharing for social sciences. Continuously funded by the Economic and Social Research Council (now part of UK Research and Innovation) since its inception, the Archive has pursued a clear mission: to build a collection of data valuable to researchers and to negotiate access to meet their needs. Central to this mission is balancing wider data access with long-term usability while addressing the practical, legal, and ethical challenges of data deposit.

The early years involved significant negotiation with data creators, who often feared early publication by others, critical scrutiny, or misinterpretation of their data. These challenges were counterbalanced by the benefits of archiving, preservation, managing access, ensuring citation, and, more recently, assigning DOIs, while offering support throughout the process.

At the centre of these efforts is a robust legal and ethical framework. This ensures depositors have the rights to deposit data, protects their rights, and upholds standards around privacy, consent, and responsible use. Negotiated licence agreements align depositor goals with user needs, ensuring data are as open as possible, with restrictions applied where necessary.

This presentation will review the evolution of our three-tier licence and access framework over the past 60 years and its adaptation to emerging challenges. We will examine the key role of legal and ethical standards in balancing openness with restrictions, and how these principles intersect with broader open access frameworks and repositories for the social sciences.


Longitudinal Data Sharing: Direct Release vs Controlled Access

Dr Aida Sanchez-Galvez (UCL Centre for Longitudinal Studies)
Ms Claudia Yogeswaran (UCL Centre for Longitudinal Studies) - Presenting Author

Striking the right balance between maximising research use of longitudinal survey data and minimising risks to participants' rights is a complex challenge. Research data sharing must ensure data are widely available to the international research community in a fair, open, and transparent manner, while guaranteeing: i) sensitive and/or disclosive data are shared securely; ii) compliance with legal, ethical, and moral responsibilities to participants; and iii) adherence to consent agreements.
The UCL Centre for Longitudinal Studies (CLS) manages several national longitudinal cohort studies, which follow the lives of tens of thousands of people in the UK. CLS facilitates two levels of data access, which represent fundamentally different approaches to data dissemination and control: 1) direct data release to users, for analysis in their institutional servers; and 2) remote access via Trusted Research Environments (TREs), which are secure servers operating under the Five Safes Framework. The sensitivity and disclosure risk of the longitudinal data determines the appropriate data sharing route.
Direct data release allows ease of access and is the primary CLS method for sharing individual-level survey data. This approach is only suitable for data that have undergone thorough assessment by the CLS data management team to ensure low sensitivity and minimal identification risk. This distribution is safeguarded, as it requires registration via the UK Data Service and an application process, with data usage governed by an End User Licence and/or the CLS Data Sharing Agreement. Re-identification of individuals is strictly forbidden.
Conversely, TREs are used to share highly sensitive data or data with significant disclosivity risk. Despite being highly restrictive and often seen as a barrier to agile research, this model has been gaining popularity over the last few years and has resulted in a proliferation of TREs in the UK and across European countries.


The NextGen Harmonised Data Gateway

Dr Rabia Karatoprak Ersen (GESIS - Leibniz Institute for the Social Sciences ) - Presenting Author
Dr Insa Bechert (GESIS - Leibniz Institute for the Social Sciences )

For The EU project, Infra4NextGen, the GESIS - Leibniz Institute for the Social Sciences provides data and research infrastructure services focused on the five NextGenEU youth policy topic areas: Make it Green; Make it Digital; Make it Healthy; Make it Strong; and Make it Equal. GESIS provides users with a set of cross-national data files containing harmonized and merged data on the five themes from the European Social Survey, Generations and Gender Programme, European Values Study, International Social Survey Programme, European Quality of Life Surveys, and Eurobarometer. Beyond that, we design and set up virtual access to metadata overviews, the harmonized data files, and R scripts used for the harmonisation. In this presentation, we will introduce https://infra4nextgen.com/harmonisationgateway/ focusing on two sub-webpages: Variable Database and Harmonisation.

The Variable Database is a compilation of measurement items, which are all key variables in the five pillars of EU youth policy. It includes the items that have measured the same concept similarly across countries by at least two survey programmes within the last 20 years. The Harmonisation consists of pages dedicated to the harmonisation procedures in production of the cross-national data files using the selected items from the Variable Database. It demonstrates the step-by-step harmonisation procedures from the beginning to the end using R scripts for each data file.

The https://infra4nextgen.com/harmonisationgateway/ is not only an entry point but also a rich source as a service to the research. These webpages both provide access to the harmonised data sets and open ways for individual-level research. Individuals who have active research programme or decision-makers at the institution level can curate their data set using the variables displayed on the Variable Database and R scripts provided on the Harmonisation page.