With or Without You - Standardised Metadata in Survey Data Management 3 |
|
Session Organisers | Mr Knut Wenzig (German Institute for Economic Research - DIW Berlin ) Mr Daniel Bela (LIfBI – Leibniz Institute for Educational Trajectories) Mr Arne Bethmann (Max Planck Institute for Social Law and Social Policy) |
Time | Thursday 18th July, 16:00 - 17:30 |
Room | D31 |
With evolving data sources, such as process-generated or user-generated content, meta- and paradata play an increasingly important role in many parts of the data management lifecycle. This is also true for surveys, as they get more complex, and data management relies more on properly defined processes to ensure both data quality and maintainability. In turn, many studies, data providers and data archives have developed systems of structured metadata tailored to their specific data management needs. While some of these systems are (loosely) based on evolving metadata standards like DDI or SDMX, many are custom made solutions. For the goal of making metadata comparable and shareable across studies and institutions this is obviously a less than ideal situation.
In this session we want to discuss the issue from a practitioners view, and want to hear from people who are faced with the challenge of implementing structured metadata systems, or have done so in the past. Particularly, we want to hear about the possible benefits, problems and drawbacks when implementing metadata systems that adhere closely to metadata standards like DDI or SDMX. Possible questions to be discussed would be:
- Which processes would benefit from standardized metadata?
- Are there examples for metadata systems which cover multiple steps within the whole lifecycle?
- Are there sources for shared and reusable metadata?
- Are there tools to process standardized metadata?
- What could be incentives for sharing metadata and tools?
Keywords: metadata, ddi, sdmx
Mr Daniel Bela (LIfBI – Leibniz Institute for Educational Trajectories) - Presenting Author
The metadata system designed and implemented for the German National Educational Panel Study (NEPS) has proven to be a valuable tool to automate data management and documentation for dozens of NEPS studies throughout the past years.
Recently, more large-scale studies being conducted at the LIfBi besides NEPS. This has been supported by enhancing the NEPS metadata structure in two ways: (1) By extending the scope of the metadata to central study parameters documenting the field work procedures other study elements for internal purposes, linking these elementes to the metadata infrastructures; and (2) by enabling the metadata system to hold information about study projects different form NEPS, whilst fostering metadata reuse between the projects.
After a quick introduction to the NEPS metadata system's structure in a nutshell, the presentation will sum up its use for automated documentation and data processing. Afterwards, it will give insights into the process of extending the system for use by other projects. In an outlook, there is room to discuss this practice of opening a “proprietary” (i.e. custom-tailored, not standards-based) metadata system, and the implications for (meta)data sharing in a broader community.
Ms Maja Dolinar (University of Ljubljana, Faculty of Social Sciences, Social Science Data Archives) - Presenting Author
Ms Irena Vipavc Brvar (University of Ljubljana, Faculty of Social Sciences, Social Science Data Archives)
The Slovenian Social Science Data Archives (ADP) identified a need for a self-archiving tool for researchers/PhD students and chose the DataVerse application as the best option, since it enables easy self-deposit of a survey (quick publication of research data), and easy browsing of the catalogue and access to data for the final users (option for downloading of the survey data files on the local computer). DataVerse is an open source web application to share, preserve, cite, explore, and analyse research data. The drawback of the default installation of DataVerse is that it has a metadata structure for social sciences which does not fully follow the FAIR principles (findable, accessible, interoperable and reusable). In order to have FAIR compliant surveys, (meta)data need to be richly described with a plurality of accurate and relevant attributes, (meta)data must meet domain-relevant community standards (such as the CESSDA Metadata Model for social sciences data), (meta)data should use vocabularies that follow FAIR principles (such as the CESSDA Controlled Vocabularies) and they should use a formal, accessible, shared and broadly applicable language for knowledge representation (ex. widely acceptable international standards such as the DDI for social sciences). In order to adapt the default DataVerse application to support standardized metadata and controlled vocabularies that follow the CESSDA Metadata Model, the DataVerse software needed to be adjusted, which in turn demanded several workarounds of the application. This work was done within the CESSDA DataverseEU project, whose project partners are ADP (Slovenia), AUSSDA (Austria), DANS (the Netherlands), GESIS (Germany), SND (Sweden) and TARKI (Hungary). The paper will present the experience we had at ADP in adjusting the DataVerse software and expose problems and workarounds that users of the application should be aware of when wanting to adjust the application to follow standardized metadata models that allow easy (re)use and findability of surveys.
Mrs Ines Drefs (ZBW – Leibniz Information Centre for Economics) - Presenting Author
Mr Fidan Limani (ZBW – Leibniz Information Centre for Economics)
Mr Atif Latif (ZBW – Leibniz Information Centre for Economics)
The implementation of structured metadata systems that allow for research objects to be findable across studies, institutions and disciplines has become a key challenge in contemporary empirical research. The GeRDI - Generic Research Data Infrastructure project has taken on this to develop a generic research data infrastructure of interconnected research data repositories. An integral part for such an infrastructure is a metadata scheme that contains generic as well as discipline-specific metadata elements. In that way core services for successful research data management can be created on top of such a comprehensive schema.
Our presentation elucidates how the GeRDI project proceeded in developing its metadata scheme. First of all, both generic (e.g. Dublin Core, DataCite, CERIF, DCAT, etc.) and discipline-specific (e.g. DDI, SDMX, etc.) metadata components of selected research communities were evaluated. With reusability a core focus in mind, the DataCite schema proved to be the best fit for the generic metadata context of the GeRDI project. Moreover, operational RDI requirements were found to dictate a metadata extension of extra elements. Together with the DataCite schema, these extensions ended up forming the core GeRDI schema. Metadata that could not be represented via the core part (e.g. discipline-specific elements) was handled by a disciplinary schema part. The identification of metadata elements for this part is driven by the communities, prioritized on their use cases. Balancing the generic and disciplinary metadata presence in our schema remains one of the key challenges.
Furthermore we point to an initiative which represents both a source for shared metadata as well as an incentive for sharing: GO FAIR. GeRDI has founded an Implementation Network within the GO FAIR initiative. The initiative brings together individuals, institutions and organisations who – in one way or another – contribute to making unlinked research data (FAIR). Within this open and bottom-up community, discussing and sharing metadata standards.
Ms Anne Cornilleau (Center for Socio-Political Data (Sciences Po))
Ms Alina Danciu (Center for Socio-Political Data (Sciences Po)) - Presenting Author
Considering the vast amount of data available today, finding relevant datasets can be challenging for a researcher. Therefore, FAIR bilingual metadata plays a crucial part in supporting the discovery and understanding of data.
In 2017, a third of the Center for Socio-Political Data’s (CDSP) data users were non-French speaking and used the English interface of our downloading data platform. Most of them wanted to access French pre- or post-electoral surveys and compare the electoral trends in France to the ones of other countries. At that time, we took the decision to work with a professional translator and make available our surveys’ DDI study-level metadata in English.
Therefore, we are now offering a bilingual user interface for data download and are working on a bilingual study-level metadata description.
In this way, users can identify easier relevant surveys for their research and are able to understand at least their description, even if the data are not in his/her preferred/native language.
In this process, if we assume that multilingualism is a dimension of metadata discoverability and quality in general, what are the next steps? In our case, we noticed that sometimes, there was a need to work with the translator in order to have a harmonized translation on elements like agency names or keywords. Parallel to the translation, we also need to go back to the original metadata and work on their standardization.
This paper will focus on both the metadata translation and the bilingual metadata harmonization process at the CDSP. By using meaningful metadata to describe the research outputs, we hope we’ll encourage the findability of our repository contents.