ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

New Developments in Using, Sharing, and Re-using Metadata

Session Organisers Mr Knut Wenzig (DIW Berlin/SOEP)
Mr Daniel Bela (LIfBi)
Dr Arne Bethmann (SHARE Germany and SHARE Berlin Institute)
TimeThursday 17 July, 09:00 - 10:30
Room Ruppert 119

Metadata systems have evolved from passive documentation tools into active drivers of data management and utilization. This session explores recent advancements that enhance the use, sharing, and re-use of metadata across the data lifecycle, emphasizing innovative methods that improve data quality, interoperability, and efficiency.

With machine-readable metadata, processes like survey instrument generation, data validation, and preparation are increasingly automated, reducing errors and enhancing data-driven decision-making. Metadata systems are becoming essential components in not just documenting data, but actively shaping and streamlining the entire data lifecycle.

We invite papers that highlight:

- Innovative Uses: Examples of how metadata systems are leveraged for automation and optimization in data collection, processing, and analysis.
- Interoperability: Experiences with implementing metadata standards (e.g., DDI, SDMX) to facilitate sharing and re-use across different systems and institutions.
- Collaborative Platforms: Case studies on platforms that support community-driven creation, sharing, and re-use of metadata.
- FAIR Principles: Approaches that ensure metadata adheres to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
- Future Directions: Emerging technologies, such as AI and machine learning, that could revolutionize metadata use.

This session aims to provide a comprehensive overview of current trends and future directions in metadata management. We seek presentations that not only showcase technological advancements but also discuss the practical challenges and lessons learned in implementing these innovations. By bringing together researchers, data managers, and technologists, this session will foster a rich exchange of ideas on how new developments in metadata can lead to more effective and insightful data management practices.

Keywords: Metadata Management, Interoperability, FAIR Principles, Data Automation, Metadata Standards

Papers

Recent Developments in and Upcoming Endeavors to the NEPS Survey Life Cycle

Mr Daniel Bela (LIfBi / NEPS) - Presenting Author
Mr Simon Dickopf (LIfBi / NEPS)

Whilst long seen as a primary benefit to survey documentation, machine-actionable survey metadata also offer possibilities to implement more efficient and less error-prone survey management procedures, like preparatory test for and conducting the survey field work. We want to showcase the recent developments undertaken for the German National Educational Panel Study (NEPS) making use of these opportunities. The NEPS started its newly recruited Starting Cohort 8, a panel sample of 5th graders, in 2022. For this new cohort study, NEPS implemented a whole new survey infrastructure based on freely available software components. The project’s long-established centralized and structured metadata storage housed at the Leibniz Institute for Educational Trajectories (LIfBi) served as a backbone to re-imagine the panels’ life cycle with modern workflows.

We developed automated procedures for preparing, generating, and processing survey instruments based on their reference metadata. This eliminated the need for (manually produced) programming templates as well as the manual programming of the instruments itself. Instead, the survey environment is automatically deployed to the field hardware via containerized software images. By putting the surveys’ metadata in the center of the infrastructure, we were able to accelerate survey creation and extend its testing. We also ensure the coherence of the stored metadata with the instruments contents and, ultimately, the disseminated data products and documentation.

Future developments aim to extend these workflows beyond the scope of our in-house software environments, so that NEPS’ metadata can be interchanged e.g. with contracted field institutes without manual interaction. Eventually, we aim to document the tools we created publicly, so that other survey infrastructures may build upon them.


What should FAIR Question Banks look like and how do we get there?

Mr Jon Johnson (CLOSER, Social Research Institute, UCL) - Presenting Author
Dr Suparna De (Department of Computer Science, University of Surrey)
Dr Wing Yan Li (Department of Computer Science, University of Surrey)
Dr Chandresh Pravin (Department of Computer Science, University of Surrey)
Mr Paul Bradshaw (Scottish Centre for Social Research (ScotCen))

The development of the CESSDA European Question Bank (https://eqb.cessda.eu/) opens up the possibility of making the core tool for survey research “the question” a referenceable and reusable object to the survey community. For decades, whilst questions have been extensively developed and tested, they have mostly been available within PDFs as adjuncts to the available data.

This has two main consequences, comparison between questions (especially across populations and studies) is onerous and time consuming, and the unavailability of the questions for reuse mitigates against provenance and reproducibility and the development of questionnaire tooling which could utilise this.

The presentation will talk about the challenges of capturing questions and questionnaires and providing them as FAIR objects, what such objects need to contain so that they can be reused from the perspective of 12 years of the CLOSER Discovery project in the UK.

The presentation will also cover, recent advances in extraction of questions from PDF’s into DDI-Lifecycle for interoperability and the limitations and possibilities Machine Learning technologies can offer for question comparison from diverse sources.


Repeat Cross Sectional Survey Data Management with Colectica: A Transition to the DDI Lifecycle Standard in the European Social Survey (ESS)

Mr Ole-Petter Øvrebø (Sikt - Norwegian Agency for Shared Services in Education and Research) - Presenting Author
Mrs Gyrid Bergseth (Sikt - Norwegian Agency for Shared Services in Education and Research)

Effective metadata management is crucial for ensuring the quality, consistency, and usability of large survey datasets, particularly in longitudinal studies like the ESS where tracking changes over time is essential. In this presentation, we share our experience transitioning to a data management process based on the DDI lifecycle standard, relying on Jupyter/Python for data processing, Colectica software for metadata management and custom-built APIs for data and metadata handling.

We will demonstrate how Colectica’s structured metadata model improves on our previous tools by enabling more efficient management of longitudinal data, supporting dynamic metadata versioning, and facilitating automated documentation processes. Key features, such as the integration of metadata across waves of a survey and how we make use of its metadata outputs, will be showcased.

The presentation will also highlight the technical and operational benefits of this new data platform, including reduced manual effort in metadata management, improved data consistency, and enhanced metadata sharing capabilities. We will also touch upon challenges encountered during the transition and lessons learned.


Into the Metadataverse: Metadata-based Survey Data Management

Mrs Lisa Ziemba (Statistics Austria) - Presenting Author

Using metadata to automate data processing and aid data validation is an important new evolvement in survey data management, because it allows for documentation and automation of many repetitive tasks during the data lifecycle. However, the metadata management is a technical and strenuous task, as it requires meticulous record keeping in a standardized way. Especially panel surveys face the challenge of managing changing metadata over the years. So how do we not get lost in this vast metadataverse and rather use the myriad of information as our guide, change management system, documentation tool, and overall encyclopedia to the survey data?
As a first step to achieve this goal, we developed a data processing workflow in R based on one key-value metadata table, which gets updated through the processing and validation workflow for use in the Austrian Socio-Economic Panel (ASEP). This workflow will serve as a metadata-based centralized survey data management, that can be used for many waves to come, while simultaneously keeping track of all changes and being adaptable with existing standards, such as DDI or SDMX.
In the presentation we will show how the survey lifecycle can be made more efficient and transparent by implementing a metadata concept that consists of a simple and maintainable basis, which is easily scalable, adaptable, and expandable. We will discuss the potential outputs, such as a fully automated data documentation and a metadata enriched survey data file for the users.
By sharing our implementation processes in a newly established panel survey, we contribute to advancing the use of metadata in the social sciences for data documentation and management.