All time references are in CEST
New developments in research data infrastructures |
|
Session Organiser | Dr Daniel Fuß (Leibniz Institute for Educational Trajectories (LIfBi)) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
The session is intended to gain an insight into current developments in regard to the provision of (a) research data and (b) innovative support services for data users. It points to the growing importance of research data management and the important role of data infrastructures to support researchers in an increasingly complex landscape with diverse data sources and data formats.
First starting point is the system of accredited Research Data Centers (RDC) in Germany. These RDCs were created to enable access to restricted research data, in particular data relating to individuals, households or companies. Such sensitive data require specific safeguards for provision and use. More than 40 RDCs act as a decentralized network of research data infrastructure providers for national and international scientists. They serve as a kind of “data trustee” whose quality assurance in terms of data editing, management, documentation, and dissemination is ensured by a joint accreditation and monitoring process with the German Data Forum (RatSWD).
The second starting point is the wide range of services that have been implemented as part of the Consortium for the Social, Behavioral, Educational and Economic Sciences (KonsortSWD) within the National Research Data Infrastructure (NFDI) in order to exploit new data sources and to further improve access to and use of data for empirical analyses. Examples include QualidataNet as a central platform for the secondary use of qualitative data resources, RDCnet as a planned network of interconnected onsite guest researcher workstations for improved access to highly sensitive information and Forum4MICA as an online place for the public exchange of information on all aspects of research data and research data management.
Based on selected contributions, the session will highlight opportunities for accessing relevant research data sources and related services with a discussion of challenges in their provision and gaps that still exist.
Keywords: Research Data Center, Data Access, Sensitive Data, Data User Services, Quality Assurance, Secondary Data Supply
Ms Ekaterina Chicherina (GESIS Leibniz Institute for the Social Sciences) - Presenting Author
The establishment of RDCs (Research Data Centers) ensures the secure access and storage of the high-quality data for secondary use. One of the limitations is the shortage of well-trained RDM (Research Data Management) specialists who are competent in a variety of skills needed for the research data curation. Compared to the educational support for researchers we observe a very limited choice of training opportunities for data curators in the field of professional RDM. Furthermore, data curators’ competences are still not clearly defined and standardised. To address these challenges, we established the information and training platform – RDM Compas. This platform is a part of KonsortSWD and offers support for the RDCs employees in their data curation work and provides them an opportunity to improve their professional skills related to RDM and data curation.
RDM Compas is divided into three main components: an RDM Knowledge Base, a Training Center and a Certificate Course Module. For all these components we use the Data Curation Lifecycle (Higgins, 2008) as a basis for describing the associated data curation tasks in detail. Thus, RDM Compas provides a comprehensive range of resources, including informational support in data curation processes, interactive training, and professional development in RDM. A variety of internally developed RDM courses, alongside the relevant external offers, practical learning paths, and specialized modules, guided by the OER (open educational resources) principles, cover key topics such as sensitive data handling, legal requirements, data ingestion, and publication processes. They are all designed to enhance the skills of RDC staff and facilitate collaboration with researchers. We believe that due to all RDM Compas elements, the development and use of our central platform for data curation will significantly contribute to improving data quality in RDCs and strengthening research data infrastructure overall.
Mr Filippo Accordino (Institute for Research on Population and Social Policies - National Research Council) - Presenting Author
Mrs Daniela Luzi (Institute for Research on Population and Social Policies - National Research Council)
Mr Fabrizio Pecoraro (Institute for Research on Population and Social Policies - National Research Council)
In the social sciences, data archives are an important way of enhancing research data. In particular, they ensure quality documentation and long-term preservation of data. They enable research to be reproduced. Through re-use, they offer the possibility of answering new research questions, avoiding the time-consuming process of new data collection.
In the European context, data archives in the social sciences date back to the first pioneering attempts in the 1960s. Over the decades, data archives have been established in almost all European countries.
The CESSDA-ERIC (Consortium of European Social Science Data Archives) infrastructure brings together the main social science data archives. The consortium, which provides opportunities for sharing skills and expertise, is inspired by the FAIR principles and offers training opportunities to promote the deposit of well-documented data.
The challenge for any data archive is to grow its deposits in order to provide users with a wide availability of attractive and reusable data for scientific research.
Some archives benefit from the data produced by the research institutes or universities that manage them. Another important source is large research programmes, such as longitudinal studies, which choose a specific archive as the place to publish their data.
More challenging is to obtain the deposit of data collected by individual researchers. This depends on the propensity and habit of publishing one's own data, an activity that also requires specific management and documentation skills.
Feeding a data repository, particularly if newly established, is therefore a major challenge.
This contribution intends to reflect on the concrete possibilities of feeding a data archive for the social sciences, illustrating the experience of the archives belonging to the CESSDA infrastructure, and suggesting some strategies to encourage data deposit also by individual researchers.
Ms Beate Lichtwardt (UKDS)
Mrs Dana Müller (IAB)
Dr Heike Wirth (GESIS) - Presenting Author
Researchers are increasingly interested in using sensitive data across borders for international comparative research. Currently, access to sensitive data is often restricted nationally, let alone internationally. Often, researchers can access sensitive data only nationally and via booking a Research Data Center’s Safe Room. However, due to their high level of detail and population coverage (e.g. registry data), sensitive data form an essential resource for research of social and political importance.
The International Data Access Network (IDAN), launched in 2018, is a collaboration between several Research Data Centers (RDCs) from France, Germany and the UK to facilitate the use of secure access data for research by providing reciprocal Safe Room Remote Desktop Access. The network is a) currently expanding and b) discussing ways to enable cross-border research with similar data from different data providers within the same Secure Environment.
Our presentation will give an overview of how IDAN works, and outline our plans to improve cross-border access to sensitive data. Further, we will discuss initial steps to overcome legal and technical barriers currently hindering researchers to combine sensitive data across countries for comparative research. Finally, we will highlight the benefits for researchers using these data.
Dr Martina Baumann (Leibniz Institute for Educational Trajectories) - Presenting Author
Dr Daniel Fuß (Leibniz Institute for Educational Trajectories)
The growing range of reusable data resources in the social, behavioural, educational and economic sciences opens up a broad spectrum of potential for empirical research. The expansion of research data infrastructures is accompanied by an increasing need for information on the part of the data users. A sound understanding of complex data forms the base for high-quality research. Accordingly, support from the data providers and a close dialogue with the scientific community are of key importance. In addition to the quality of this service, its transparency, flexibility and sustainability play a crucial role.
Forum4MICA – Making Information Commonly Available – is an online information and discussion platform for questions relating to reusable data resources and the handling of this data. The implementation of a modern forum solution offers new opportunities for exchange and for networking that go far beyond traditional bilateral question-and-answer interactions. Furthermore, the forum functions as a continuously growing knowledge archive in which contributions are openly searchable and permanently available. The Forum4MICA is an attempt to establish a general, low-threshold, well-structured service for both users and providers of research data infrastructures as a supplement to conventional data documentation.
Since its launch in February 2023, 18 relevant research data centres and research data management projects have committed to actively participate in the forum as experts. Forum4MICA currently has 450 registered users and over 600 user-driven articles. The aim is to successively expand this participation and thus increase the visibility of research data infrastructures within the community.
The presentation will share our experiences gained so far in setting up the forum, discuss the challenges and the applicability in other scientific fields and outline further steps. For more information see https://forum.lifbi.de/.
Ms Anna SIDORETS (Progedo (CNRS)) - Presenting Author
Dr Frédérique GROS (Progedo (CNRS))
Professor Nicolas SAUGER (Progedo | SciencePo Paris)
Quetelet-Progedo (https://data.progedo.fr/) is Progedo’s (https://www.progedo.fr/) data repository dedicated to data for the social sciences community. Its current holdings consist of more than 1,650 datasets produced by over 100 data producers and covering a wide range of topics relevant to the social sciences, such as demographics, income, and employment.
Since its conception, Quetelet-Progedo has been designed with the goal of facilitating data discovery and access by leveraging the FAIR principles. The latest version, which launched in spring 2024, is demonstrative of this pursuit as Quetelet-Progedo now offers a single entry point for discovering and requesting access to the available datasets.
The platform is utilised by two distinct data dissemination teams: Progedo and INED (https://www.ined.fr/en/). This collaboration, combined with the diversity of the data, creates a unique working environment that presents specific opportunities and challenges. To provide a high-quality service and enhance user experience, we navigate distinct levels of data access (i.e., public use files and scientific use files) and work with a variety of data formats.
This presentation will explore the challenges encountered in the day-to-day operations, the strategies developed to address them, and how these experiences are leveraged to drive the ongoing enrichment of Quetelet-Progedo and expand its functionalities.
Dr Ulrich Krieger (Mannheim University Library) - Presenting Author
The increasing complexity of research data sources and formats demands innovative
solutions in research data management. This session explores recent advancements in
working with unstructured data coming out of the BERD@NFDI consortium.
BERD@NFDI as part of the German National Research Data Initiative (NFDI), provides
training, tools, and data for social scientists working with text, audio, and video data. Key
services include the BERD Academy, which offers targeted training for handling unstructured
data, and the BERD Data Portal, which enables access to curated datasets. The BERD
Marketplace connects researchers and companies, fostering collaboration between
academia and industry. In addition, BERD@NFDI provides legal assistance for researchers
through virtual assistants and a helpdesk, and an advanced OCR solution designed for
digitizing historical business records, transforming scanned documents into
machine-readable formats.
In this presentation, we will demonstrate how these services empower researchers to
navigate the challenges of unstructured data, ensuring robust and reproducible results
across diverse research domains.
Mr Daniel Buck (DZHW) - Presenting Author
Ms Kerstin Beck (GESIS Leibniz-Institut für Sozialwissenschaften)
Ms Ute Hoffstätter (DZHW)
Dr Pascal Siegers (GESIS Leibniz-Institut für Sozialwissenschaften)
Certification as a trustworthy digital repository is vital to convincing researchers to deposit their data in a research data centre (RDC) or repository. It also helps data infrastructures to professionalise and document internal processes to fulfil the FAIR Principles.
First, our contribution summarises findings from a case study on implementing CoreTrustSeal for small and medium-sized infrastructures as part of a project to support social sciences RDCs (KonsortSWD, funded by the German National Research Data Infrastructure). The project has prepared supporting materials for the RDCs on meeting the CoreTrustSeal requirements and discussed challenges with RDCs in a working group. The results show a lack of conceptual and technical knowledge about and solutions for long-term archiving (e.g. persistence of data and metadata). Also, technical requirements need increased cooperation with the organisation’s IT staff. Overall, resources are lacking, especially for small and medium-sized RDCs.
Second, a national accreditation by the RatSWD (German Data Forum) is compared to CoreTrustSeal. This accreditation ensures high quality in the handling and disseminating sensitive data in RDCs. However, it contains fewer quality criteria for internal data management and no detailed criteria for IT security and long-term archiving. CoreTrustSeal thus supplements the RatSWD accreditation with essential dimensions of long-term archiving and FAIR data management.
We conclude that national, disciplinary-specific certifications can play essential roles. However, the benefits of the CoreTrustSeal certification for archiving and providing data, especially in international and interdisciplinary contexts, need to be promoted additionally strongly within the research infrastructures. It fosters the trust of researchers and funding organisations in infrastructures, especially because concerns about data security are becoming increasingly more important. Ultimately, increased trust in the infrastructures promotes data sharing and thus helps to close gaps in data needs for innovative research.
Mr Neill Murray (DIW Berlin) - Presenting Author
The RDCnet aims to facilitate access to sensitive research data by fostering collaborations between Research Data Centers (RDCs) and enabling the mutual use of their secure workstations. Rather than requiring researchers to travel to a specific data-providing RDC to analyze their data on-site, RDCnet offers a decentralized network of secure workstations at various partner locations, providing researchers with greater flexibility. At its core, RDCnet is based on the idea that each participating institution provides a secure workstation and, optionally, remote access to their research data while being interconnected with all other institutions in the network. This approach ensures that data providers maintain full control over who can access their data and where, while enabling researchers to work with sensitive data from any secure workstation within the RDCnet. To ensure the required levels of data protection, all secure workstations must be maintained and configured according to standardized security criteria, developed collaboratively with eight RDCs in Germany. This guarantees that sensitive data is processed exclusively within strictly controlled environments. To realize RDCnet, we provide essential services and support to facilitate multilateral collaboration. These include a unified cooperation agreement with clear organizational and technical guidelines, a shared platform for booking secure workstations, and technical support for implementing secure work environments and remote access systems. By lowering access barriers, RDCnet reduces costs for researchers working with sensitive data while simultaneously improving access possibilities for data providers. This approach not only ensures efficient and secure data use but also enhances the visibility of research data, ultimately increasing the number of potential users.
Dr Daniel Fuß (Leibniz Institute for Educational Trajectories) - Presenting Author
The aim of this contribution is to present the network of Research Data Centers (RDCs) in Germany accredited by the German Data Forum (RatSWD) as a pioneer and a model for the quality-assured management of restricted research data. This network was established more than 20 years ago to improve the availability of and access to research data in the social, behavioral, educational, and economic sciences. Currently, there are 41 accredited RDCs with a broad range of expertise in the processing and provision of data from large-scale survey studies (e.g. SOEP), official statistics (e.g. microcensus), registers (e.g. pension insurance) and qualitative surveys (e.g. text collections). In 2023, these RDCs offered more than 6,800 datasets, which were used for analyses by almost 85,000 researchers.
A particular feature of most research data in these disciplines is the relation to real persons, households, firms or institutions, which require specific safeguards for the handling of sensitive data. Therefore, the RDCs must take special measures to enable researchers to use this data in compliance with strict data protection regulations. In the research data infrastructure, the RDCs play an important role as trustworthy intermediaries (“data trustees”) between data providers or data producers on the one hand and the scientific community on the other. Key elements of quality assurance are the accreditation of RDCs by the RatSWD, the cooperation of all accredited RDCs in the Committee for Research Data Infrastructure (FDI Committee), a central complaints management system, and the active participation in several projects within the German National Research Data Infrastructure (NFDI) initiative.
The presentation will offer an insight into the development and the work of the RDC infrastructure. Some current innovations will be introduced, which are also of importance for international networking.