With or Without You - Standardised Metadata in Survey Data Management 2 |
|
Session Organisers | Mr Knut Wenzig (German Institute for Economic Research - DIW Berlin ) Mr Daniel Bela (LIfBI – Leibniz Institute for Educational Trajectories) Mr Arne Bethmann (Max Planck Institute for Social Law and Social Policy) |
Time | Thursday 18th July, 14:00 - 15:30 |
Room | D31 |
With evolving data sources, such as process-generated or user-generated content, meta- and paradata play an increasingly important role in many parts of the data management lifecycle. This is also true for surveys, as they get more complex, and data management relies more on properly defined processes to ensure both data quality and maintainability. In turn, many studies, data providers and data archives have developed systems of structured metadata tailored to their specific data management needs. While some of these systems are (loosely) based on evolving metadata standards like DDI or SDMX, many are custom made solutions. For the goal of making metadata comparable and shareable across studies and institutions this is obviously a less than ideal situation.
In this session we want to discuss the issue from a practitioners view, and want to hear from people who are faced with the challenge of implementing structured metadata systems, or have done so in the past. Particularly, we want to hear about the possible benefits, problems and drawbacks when implementing metadata systems that adhere closely to metadata standards like DDI or SDMX. Possible questions to be discussed would be:
- Which processes would benefit from standardized metadata?
- Are there examples for metadata systems which cover multiple steps within the whole lifecycle?
- Are there sources for shared and reusable metadata?
- Are there tools to process standardized metadata?
- What could be incentives for sharing metadata and tools?
Keywords: metadata, ddi, sdmx
Dr Steven McEachern (Australian Data Archive) - Presenting Author
The management of survey data within a specific data collection over time provides some unique challenges for the researcher in maintaining consistency and interpretability. These challenges are then multiplied when the researcher must construct their dataset from multiple sources that used multiple methods - how can the researcher find, integrate and document these multiple sources in an effective and defensible manner? The use of consistent, integrated survey metadata provides one means for enabling this process.
This paper seeks to demonstrate the application of structured metadata to integrated time series through a project currently in development at the Australian Data Archive and the Centre for Social Research and Methods at the Australian National University. The CSRM are currently working to develop a data portal for the analysis and visualisation of time series surveys across series and over time. The intent of the portal is to enable researchers and the public to study and understand movements in Australian public opinion over time, irrespective of the time series from which the specific point measure was sourced.
This process however creates significant challenges in managing and integrating the data. The measurement of an individual variable over time may be drawn from multiple series, and use variations in sampling, measurement and framing that each need to be documented and connected to the specific point measure to provide a defensible research methodology for the researcher, and clear interpretation for the secondary user. The paper describes the experiences of the ADA staff in developing the integrated datasets, including the capacity of DDI and related standards to document the source data, harmonisation process and integrated output, to provide an integrated data source that is both representative of its source material, and consistent in the quality of the resultant integrated data.
Mr Marcus Maher (Ipsos)
Dr Alan Roshwalb (Ipsos) - Presenting Author
Dr Robert Petrin (Ipsos)
Survey tracking programs monitoring such as performance measurement programs and public opinion polls repeat the surveys either continuously or at least regularly. The programs establish data capture methods, cleaning rules, and reporting to allow for quick turnaround in results. The quick turnaround requirements and the repetitive rhythm of the data collection is prime for small errors to creep into the data capture process. Automation helps in the survey collection and data capture but there are always possibilities for errors to occur. Many studies rely identifying changes in data streams using rule such as a change in score of more than the margin of error or a change of a specified amount will instigate a review procedure. These methods are inexact in helping identify possible data collection or data capture errors, or identifying possible changes in data trends. This paper examines using Bayesian testing in a quality control construct to identify unexpected changes in data trends and set them aside for deeper review. The approach incorporates past data using empirical Bayes methods in prior distributions to be used in Bayes Factor analyses. These analyses should have greater sensitivities to changes in the data stream due to data collection and data capture errors or real change in the trend. Any credible changes in data distributions are flagged for further review. This paper examines the data from tracking performance studies and polling.
Miss Kerrin Borschewski (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Mr Stefan Müller (GESIS - Leibniz Institute for the Social Sciences)
Mr Wolfgang Zenk-Moeltgen (GESIS - Leibniz Institute for the Social Sciences)
The use of areal information about respondents’ neighborhoods provides great benefits for social science research. By using georeferenced survey data, for example, researchers can answer questions about individual social behavior or attitudes while also taking into account the detailed spatial patterns of social processes. As with all research data, to make such data understandable, shareable and re-usable, the use of well-established metadata standards is imperative. Both for the survey data and the geographic data such metadata standards exist: The Data Documentation Initiative standard (DDI) of the social sciences and the ISO 19115 standard of the geosciences. Challenges, however, generally arise when researchers aim to document data which originate at the interface of different scientific disciplines, as in the case of georeferenced survey data. These data imply a need to document data from different sources and of different types, and hence of different contents and different structures. The aforementioned metadata standards were not designed to document linked data collections in all use cases. As such, to guarantee thoroughly documented and interoperable metadata, data librarians with interdisciplinary expertise need to get involved in such research projects in an early stage.
In this presentation, we showcase a use-case of social science survey data that are spatially linked to geospatial data attributes. We present the challenges and analyze to which extent the social sciences metadata standard DDI-Lifecycle, which contains elements compatible to ISO 19115, is capable of documenting said data. In response to the challenges of documentation, we display different approaches for a solution. The information retrieved from this case study can help to assist the producers of metadata standards by displaying the needs of special use-cases and to support metadata initiatives, e.g. by delivering content related input on the need for special metadata elements.
Dr Hayley Mills (UCL) - Presenting Author
Mr Jon Johnson (UCL and UKDS)
CLOSER brings together eight world-leading UK longitudinal studies in order to maximise their use, value and impact. A major output of CLOSER is the search engine CLOSER Discovery (discovery.closer.ac.uk). CLOSER Discovery allows users to search and browse questionnaire and dataset metadata for researchers to discover what data are available.
Efficient data management of complex longitudinal studies is both desirable and increasingly essential to ensure that data are in the hands of researchers in a timely manner. Metadata standards are critical for straight-forwardly maintaining information through the data life-cycle, from data collection to output for research.
When these metadata are fully documented they can be utilised further to allow new data and metadata management possibilities going forward. For example, capturing Computer-Aided Interview metadata can be used to assist in the processing and validation of data. Auto-processing of data to extract metadata and linking that to the origin questions, whilst reusing the concepts defined in survey design, allows the generation of high quality documentation to accompany data sharing. The outputs can also be used to create reusable resources, such as question banks which allow provenance of questions for other studies to utilise.
CLOSER has been developing a suite of tools and software using both in-house and commercially available solutions that begin to tackle come of the obstacles involved in documenting and utilising longitudinal metadata. The presentation will report on the successes and problems faced in using the DDI-Lifecycle metadata standard to achieve these ambitions.