ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

New Developments in Using, Sharing, and Re-using Metadata 2

Session Organisers Mr Knut Wenzig (DIW Berlin/SOEP)
Mr Daniel Bela (LIfBi)
Dr Arne Bethmann (SHARE Germany and SHARE Berlin Institute)
Dr Yuri Pettinicchi (SHARE)
TimeThursday 17 July, 14:00 - 15:00
Room Ruppert 119

Metadata systems have evolved from passive documentation tools into active drivers of data management and utilization. This session explores recent advancements that enhance the use, sharing, and re-use of metadata across the data lifecycle, emphasizing innovative methods that improve data quality, interoperability, and efficiency.

With machine-readable metadata, processes like survey instrument generation, data validation, and preparation are increasingly automated, reducing errors and enhancing data-driven decision-making. Metadata systems are becoming essential components in not just documenting data, but actively shaping and streamlining the entire data lifecycle.

We invite papers that highlight:

- Innovative Uses: Examples of how metadata systems are leveraged for automation and optimization in data collection, processing, and analysis.
- Interoperability: Experiences with implementing metadata standards (e.g., DDI, SDMX) to facilitate sharing and re-use across different systems and institutions.
- Collaborative Platforms: Case studies on platforms that support community-driven creation, sharing, and re-use of metadata.
- FAIR Principles: Approaches that ensure metadata adheres to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
- Future Directions: Emerging technologies, such as AI and machine learning, that could revolutionize metadata use.

This session aims to provide a comprehensive overview of current trends and future directions in metadata management. We seek presentations that not only showcase technological advancements but also discuss the practical challenges and lessons learned in implementing these innovations. By bringing together researchers, data managers, and technologists, this session will foster a rich exchange of ideas on how new developments in metadata can lead to more effective and insightful data management practices.

Keywords: Metadata Management, Interoperability, FAIR Principles, Data Automation, Metadata Standards

Papers

Metadata-Driven Production of Longitudinal Datasets: Introduction and Potential of the Metadata Attribute

Mrs Claudia Saalbach (DIW Berlin/SOEP)
Mrs Jana Nebelin (DIW Berlin/SOEP) - Presenting Author

The production of longitudinal survey datasets poses significant technical and
substantive challenges for data producers. In particular, the integration, analysis, and
documentation of large volumes of data require sophisticated processes and
methodological expertise. From questionnaire development to dataset definition and
the final generation of longitudinal datasets, all steps are increasingly driven by
metadata. At the same time, it is essential to ensure that the extensive data offerings
are user-friendly and efficiently accessible.
At the SOEP (Socio-Economic Panel), approximately 23 survey instruments are
deployed annually, yielding around 41 raw survey datasets and roughly 15
longitudinal datasets, covering nearly 15,000 longitudinal variables from 1984 to the
present. The SOEP’s metadata system plays a pivotal role in this process, offering
various levels of granularity – from study, questionnaire, dataset, and variables to
topics and concepts. A new metadata attribute, the "module," extends this system by
positioning itself conceptually between topics and concepts and structurally between
datasets and variables.
The introduction of the "module" attribute allows for greater flexibility and efficiency in
the generation of longitudinal data products. Instead of processing large datasets
globally, targeted groups of variables – thematically cohesive and methodologically
grounded – can be handled separately. This approach not only enhances the
efficiency and speed of data production but also contributes to improving data
quality.
In our presentation, we provide insights into the practical implementation of
metadata-driven production at SOEP and demonstrate the potential of the "module"
metadata attribute for both data producers and users. Against the backdrop of
related concepts, such as "topic" or "concept" (e.g., at GESIS), we also propose an

initial definition of the term "module" and discuss its added value in the context of
longitudinal data production.


The ODISSEI Portal: A Metadata-Only Repository To Enhance Data Reuse in the Dutch Social Sciences

Mr Lucas van der Meer (Erasmus University Rotterdam/ODISSEI)
Dr Kasia Karpinska (Erasmus University Rotterdam/ODISSEI) - Presenting Author
Dr Angelica Maria Maineri (Erasmus University Rotterdam/ODISSEI)
Dr Tom Emery (Erasmus University Rotterdam/ODISSEI)

Despite a growing amount of data that is available for reuse in social research, data discovery is hindered by the multiplicity of registries and data sharing platforms and, consequently, by the variety in standards and terminologies to describe the data and the access conditions. Moreover, this heterogeneity limits opportunities for data linkage, which are often invisible to the users. To solve this fragmentation, the Dutch data infrastructure for the social sciences (ODISSEI) has launched a metadata-only Portal which unlocks access to several data collections.

The ODISSEI Portal combines metadata from a wide variety of research data repositories in the Netherlands into a single interface, allows advanced semantic queries to support findability, and facilitates data access. The ODISSEI Portal is a Dataverse interface which collects metadata from different providers, including Statistics Netherlands (CBS), the Data Archiving and Networked Services (DANS), and the LISS data archive. Metadata from the various providers is harvested (via endpoint or file dumps), harmonised to a common metadata schema, and enriched with multilingual thesauri to support a multilingual search. The enriched metadata is also exported to a knowledge graph, available via an external triple store, to enable complex queries. Moreover, a Data Access Broker (DAB) allows users not only to access open datasets, but also to request access to restricted access data from different providers, all from the Portal interface. To power the DAB, extensive work is being done to harmonise data access conditions and licenses and the way they are expressed in the metadata.

Future plans for the ODISSEI Portal include improving the functionalities of the DAB and allowing connections to Trusted Research Environments for accessing the underlying data and exchanging metadata with the CESSDA catalogue.

The ODISSEI Portal demonstrates how metadata can be leveraged to increase the FAIRness of research data.


CARING: Enhancing Open Data Quality through Community Engagement

Mr Christopher Klamm (University of Cologne)
Mr Ruben Bach (University of Mannheim) - Presenting Author
Mr Tornike Tsereteli (University of Mannheim)

Have you ever found an error in a dataset? Perhaps a misclassified sample or missing metadata about the annotation process? Have you ever wondered how you can help others benefit from your discovery of an error or new information in a dataset that is needed? We are proposing a transparent platform that allows anyone to update datasets, transforming them from static to dynamic resources. This prototype will enhance the sharing and quality assurance of open datasets, addressing challenges posed by evolving and incomplete data. While open-source datasets “sharing” are trending, ensuring their quality is challenging due to the manual validation of millions of data points. We propose a collaborative data quality evaluation concept based on “sharing and caring”. Users can add new samples, metadata, comments, or annotations, fostering continuous improvement and community engagement. Our project lays the groundwork for a platform that encourages contribution and participation from all users, integrating the concept of perspectivism. Recognizing that annotations vary due to annotator perspectives, we will gather diverse labels and metadata to reduce bias and create nuanced datasets, leading to more robust and fair models. This will aid social science research, enabling accurate conclusions that benefit researchers and the community. We will integrate an open-source annotation tool, allowing everyone to enhance datasets by correcting errors and adding new information. This will benefit all researchers by improving dataset quality. We hope to promote a new mindset regarding open data and its quality. Our updates will connect to the original dataset changes. A demonstrator platform will be developed to showcase the “sharing and caring” concept. This web application will enable user interaction with data, encouraging contributions like descriptions, references, and analysis code. Additionally, it will incorporate data versioning with distinct reference IDs for major changes, promoting reproducibility and consisten citation for researchers.


From Pogues to Kraftwerk: an Agile DDI-based framework for the survey life-cycle in France

Mr Thomas Merly-Alpa (Insee) - Presenting Author
Ms Gwennaëlle Brilhault (Insee)
Ms Sandra Gallizzi (Insee)
Mr Barbet Laurent (Insee)
Mr Eric Sigaud (Insee)

Insee, the French National Institute of Statistics and Economic Studies, is developing a cutting-edge, open-source technical environment for its household and business surveys. This "survey framework" encompasses aspects of the Design, Build, Collect, and Process levels of the GSBPM 5.1 (UNECE's Generic Statistical Business Process Model).

Central to this framework is the implementation of active metadata, primarily leveraging the DDI standard to describe variables, filters, and other survey elements. These metadata are dynamically activated during data collection, enabling tailored instructions for interviewers and optimized web surveys. Furthermore, the metadata facilitate data sanitization (with optional support from data stewards), the calculation of synthetic indicators, and comprehensive process documentation — all essential steps for robust exploitation and dissemination of results.

This innovative approach to survey development necessitates a reimagining of traditional roles. Survey project management and IT development are undergoing significant transformation, while survey designers are acquiring new skillsets, including questionnaire programming. Throughout the development process, surveyers actively participated in user experience (UX) testing of their data collection tools. The project's agile development methodology introduces ongoing evolution and necessitates adaptability to changing requirements.

This presentation will provide insights into the framework's operationalization at Insee, showcase its application in diverse survey contexts, and detail strategies for addressing the challenges inherent in this dynamic environment.