ESRA 2025 Preliminary Program
All time references are in CEST
New developments in research data infrastructures |
Session Organiser |
Dr Daniel Fuß (Leibniz Institute for Educational Trajectories (LIfBi))
|
Time | Wednesday 16 July, 11:00 - 12:30 |
Room |
Ruppert 114 |
The session is intended to gain an insight into current developments in regard to the provision of (a) research data and (b) innovative support services for data users. It points to the growing importance of research data management and the important role of data infrastructures to support researchers in an increasingly complex landscape with diverse data sources and data formats.
First starting point is the system of accredited Research Data Centers (RDC) in Germany. These RDCs were created to enable access to restricted research data, in particular data relating to individuals, households or companies. Such sensitive data require specific safeguards for provision and use. More than 40 RDCs act as a decentralized network of research data infrastructure providers for national and international scientists. They serve as a kind of “data trustee” whose quality assurance in terms of data editing, management, documentation, and dissemination is ensured by a joint accreditation and monitoring process with the German Data Forum (RatSWD).
The second starting point is the wide range of services that have been implemented as part of the Consortium for the Social, Behavioral, Educational and Economic Sciences (KonsortSWD) within the National Research Data Infrastructure (NFDI) in order to exploit new data sources and to further improve access to and use of data for empirical analyses. Examples include QualidataNet as a central platform for the secondary use of qualitative data resources, RDCnet as a planned network of interconnected onsite guest researcher workstations for improved access to highly sensitive information and Forum4MICA as an online place for the public exchange of information on all aspects of research data and research data management.
Based on selected contributions, the session will highlight opportunities for accessing relevant research data sources and related services with a discussion of challenges in their provision and gaps that still exist.
Keywords: Research Data Center, Data Access, Sensitive Data, Data User Services, Quality Assurance, Secondary Data Supply
Papers
RDM Compas - Competencies for Research Data Curation
Ms Ekaterina Chicherina (GESIS Leibniz Institute for the Social Sciences) - Presenting Author
The establishment of RDCs (Research Data Centers) ensures the secure access and storage of the high-quality data for secondary use. One of the limitations is the shortage of well-trained RDM (Research Data Management) specialists who are competent in a variety of skills needed for the research data curation. Compared to the educational support for researchers we observe a very limited choice of training opportunities for data curators in the field of professional RDM. Furthermore, data curators’ competences are still not clearly defined and standardised. To address these challenges, we established the information and training platform – RDM Compas. This platform is a part of KonsortSWD and offers support for the RDCs employees in their data curation work and provides them an opportunity to improve their professional skills related to RDM and data curation.
RDM Compas is divided into three main components: an RDM Knowledge Base, a Training Center and a Certificate Course Module. For all these components we use the Data Curation Lifecycle (Higgins, 2008) as a basis for describing the associated data curation tasks in detail. Thus, RDM Compas provides a comprehensive range of resources, including informational support in data curation processes, interactive training, and professional development in RDM. A variety of internally developed RDM courses, alongside the relevant external offers, practical learning paths, and specialized modules, guided by the OER (open educational resources) principles, cover key topics such as sensitive data handling, legal requirements, data ingestion, and publication processes. They are all designed to enhance the skills of RDC staff and facilitate collaboration with researchers. We believe that due to all RDM Compas elements, the development and use of our central platform for data curation will significantly contribute to improving data quality in RDCs and strengthening research data infrastructure overall.
Who will feed data archives? Suggestion and expectations
Mr Filippo Accordino (Institute for Research on Population and Social Policies - National Research Council) - Presenting Author
Mrs Daniela Luzi (Institute for Research on Population and Social Policies - National Research Council)
Mr Fabrizio Pecoraro (Institute for Research on Population and Social Policies - National Research Council)
In the social sciences, data archives are an important way of enhancing research data. In particular, they ensure quality documentation and long-term preservation of data. They enable research to be reproduced. Through re-use, they offer the possibility of answering new research questions, avoiding the time-consuming process of new data collection.
In the European context, data archives in the social sciences date back to the first pioneering attempts in the 1960s. Over the decades, data archives have been established in almost all European countries.
The CESSDA-ERIC (Consortium of European Social Science Data Archives) infrastructure brings together the main social science data archives. The consortium, which provides opportunities for sharing skills and expertise, is inspired by the FAIR principles and offers training opportunities to promote the deposit of well-documented data.
The challenge for any data archive is to grow its deposits in order to provide users with a wide availability of attractive and reusable data for scientific research.
Some archives benefit from the data produced by the research institutes or universities that manage them. Another important source is large research programmes, such as longitudinal studies, which choose a specific archive as the place to publish their data.
More challenging is to obtain the deposit of data collected by individual researchers. This depends on the propensity and habit of publishing one's own data, an activity that also requires specific management and documentation skills.
Feeding a data repository, particularly if newly established, is therefore a major challenge.
This contribution intends to reflect on the concrete possibilities of feeding a data archive for the social sciences, illustrating the experience of the archives belonging to the CESSDA infrastructure, and suggesting some strategies to encourage data deposit also by individual researchers.
Quetelet-Progedo: Navigating Challenges in a Social Sciences Data Infrastructure
Ms Anna SIDORETS (Progedo (CNRS)) - Presenting Author
Dr Frédérique GROS (Progedo (CNRS))
Professor Nicolas SAUGER (Progedo | SciencePo Paris)
Quetelet-Progedo (https://data.progedo.fr/) is Progedo’s (https://www.progedo.fr/) data repository dedicated to data for the social sciences community. Its current holdings consist of more than 1,650 datasets produced by over 100 data producers and covering a wide range of topics relevant to the social sciences, such as demographics, income, and employment.
Since its conception, Quetelet-Progedo has been designed with the goal of facilitating data discovery and access by leveraging the FAIR principles. The latest version, which launched in spring 2024, is demonstrative of this pursuit as Quetelet-Progedo now offers a single entry point for discovering and requesting access to the available datasets.
The platform is utilised by two distinct data dissemination teams: Progedo and INED (https://www.ined.fr/en/). This collaboration, combined with the diversity of the data, creates a unique working environment that presents specific opportunities and challenges. To provide a high-quality service and enhance user experience, we navigate distinct levels of data access (i.e., public use files and scientific use files) and work with a variety of data formats.
This presentation will explore the challenges encountered in the day-to-day operations, the strategies developed to address them, and how these experiences are leveraged to drive the ongoing enrichment of Quetelet-Progedo and expand its functionalities.
Promoting Professionalisation: A Case Study on Achieving CoreTrustSeal Certification and Improved Access to International and Interdisciplinary Data
Mr Daniel Buck (DZHW) - Presenting Author
Ms Kerstin Beck (GESIS Leibniz-Institut für Sozialwissenschaften)
Ms Ute Hoffstätter (DZHW)
Dr Pascal Siegers (GESIS Leibniz-Institut für Sozialwissenschaften)
Certification as a trustworthy digital repository is vital to convincing researchers to deposit their data in a research data centre (RDC) or repository. It also helps data infrastructures to professionalise and document internal processes to fulfil the FAIR Principles.
First, our contribution summarises findings from a case study on implementing CoreTrustSeal for small and medium-sized infrastructures as part of a project to support social sciences RDCs (KonsortSWD, funded by the German National Research Data Infrastructure). The project has prepared supporting materials for the RDCs on meeting the CoreTrustSeal requirements and discussed challenges with RDCs in a working group. The results show a lack of conceptual and technical knowledge about and solutions for long-term archiving (e.g. persistence of data and metadata). Also, technical requirements need increased cooperation with the organisation’s IT staff. Overall, resources are lacking, especially for small and medium-sized RDCs.
Second, a national accreditation by the RatSWD (German Data Forum) is compared to CoreTrustSeal. This accreditation ensures high quality in the handling and disseminating sensitive data in RDCs. However, it contains fewer quality criteria for internal data management and no detailed criteria for IT security and long-term archiving. CoreTrustSeal thus supplements the RatSWD accreditation with essential dimensions of long-term archiving and FAIR data management.
We conclude that national, disciplinary-specific certifications can play essential roles. However, the benefits of the CoreTrustSeal certification for archiving and providing data, especially in international and interdisciplinary contexts, need to be promoted additionally strongly within the research infrastructures. It fosters the trust of researchers and funding organisations in infrastructures, especially because concerns about data security are becoming increasingly more important. Ultimately, increased trust in the infrastructures promotes data sharing and thus helps to close gaps in data needs for innovative research.
Research Data Centers in Germany – A Unique Infrastructure Network
Dr Daniel Fuß (Leibniz Institute for Educational Trajectories) - Presenting Author
The aim of this contribution is to present the network of Research Data Centers (RDCs) in Germany accredited by the German Data Forum (RatSWD) as a pioneer and a model for the quality-assured management of restricted research data. This network was established more than 20 years ago to improve the availability of and access to research data in the social, behavioral, educational, and economic sciences. Currently, there are 41 accredited RDCs with a broad range of expertise in the processing and provision of data from large-scale survey studies (e.g. SOEP), official statistics (e.g. microcensus), registers (e.g. pension insurance) and qualitative surveys (e.g. text collections). In 2023, these RDCs offered more than 6,800 datasets, which were used for analyses by almost 85,000 researchers.
A particular feature of most research data in these disciplines is the relation to real persons, households, firms or institutions, which require specific safeguards for the handling of sensitive data. Therefore, the RDCs must take special measures to enable researchers to use this data in compliance with strict data protection regulations. In the research data infrastructure, the RDCs play an important role as trustworthy intermediaries (“data trustees”) between data providers or data producers on the one hand and the scientific community on the other. Key elements of quality assurance are the accreditation of RDCs by the RatSWD, the cooperation of all accredited RDCs in the Committee for Research Data Infrastructure (FDI Committee), a central complaints management system, and the active participation in several projects within the German National Research Data Infrastructure (NFDI) initiative.
The presentation will offer an insight into the development and the work of the RDC infrastructure. Some current innovations will be introduced, which are also of importance for international networking.
Future SHARE Infrastructure for a High-Frequency and Multi-Mode Panel Survey
Ms Carolina Brändle (SHARE Berlin Institute)
Dr Fabio Franzese (SHARE Berlin Institute) - Presenting Author
Ms Marlen Paulitti (Centerdata)
Ms Stephanie Stuck (SHARE Berlin Institute)
Mr Iggy van der Wielen (Centerdata)
Ms Sabrina Zuber (SHARE Berlin Institute)
The Survey of Health, Ageing and Retirement in Europe (SHARE) is a comprehensive multi-national panel study that has been conducted for over 20 years, involving approximately 70,000 interviews per wave. Traditionally, data collection occurred every 2 to 3 years using face-to-face surveys. To enhance efficiency and reduce respondent burden, SHARE employs techniques such as preloading and dependent interviewing, particularly in managing household composition data. This approach requires respondents to report only changes in household composition, such as moves or deaths. Household composition data, central to SHARE, have been particularly sensitive to inaccuracies due to the involvement of multiple respondents within households and the inclusion of proxy interviews. Ensuring accurate household composition and maintaining high data quality have required extensive data cleaning and elaborate preload preparation before subsequent survey waves.
As SHARE transitions to more frequent surveys using telephone and web-based methods, new challenges arise, particularly regarding the reduced time available for data cleaning and preload preparation between waves. This shift necessitates the development of new infrastructure to ensure high-quality data collection in a high-frequency, multi-mode environment while remaining fully compliant with GDPR regulations. Additionally, an integrated sampling functionality is planned to enable dynamic sub-sampling during survey operations. In this presentation, we introduce SHARE’s future tools and infrastructure designed to meet these challenges, thereby ensuring high-quality data collection in future panel surveys.