Putting data in the driver’s seat: The role of active (meta-)data in survey data management 1 |
|
Chair | Mr Knut Wenzig (DIW Berlin ) |
Coordinator 1 | Mr Daniel Bela (LIfBi Bamberg (Germany)) |
Coordinator 2 | Dr Arne Bethmann (DJI München (Germany)) |
The DASISH Questionnaire Design Documentation Tool – functionalities and real life examples from the tool
Benjamin Beuster, Hilde Orten
The Questionnaire Design Documentation Tool (QDDT) is developed with the aim of assisting large-scale survey projects in the processes related to questionnaire development and documentation of the questionnaire design process from the first conceptualization to the final questionnaire. It assists in particular the production of research concepts, questions, response domains and instruments for questionnaire modules of the European Social Survey.
Second, researchers and students can use the tool to explore metadata from existing projects, or to design new research. Interoperability with other systems and tools, most importantly the DASISH Question Variable Database and the Translation Management Tool, both currently under development, is another key aim.
The work on the QDDT started while the Data Service Infrastructure for the Social Sciences and Humanities (DASISH) project and continued under the Synergies for Europe’s Research Infrastructure in the Social Sciences (SERISS).
The conceptual model for the tool is based on a sub-set of the DDI 3.2 specification. The tool is designed to integrate and communicate with other tools using an API. It is designed to be compatible with DDI and both DDI import and export options are available. A set of modern technologies is used in the development of the tool.
This presentation of the QDDT focuses on its conceptual model, functionalities, first experience from the questionnaire module production of the European Social Survey, as well as plans for the further developments.
CLOSER Discovery is a cutting edge search engine for the discovery of metadata on eight of the UK’s cohort and longitudinal studies. The longest running study that is documented in CLOSER Discovery has been running for over 70 years, which creates a formidable problem to document and manage. CLOSER Discovery demonstrates the importance of investing in rich metadata that describes many more aspects of data collection than traditional tools and methods. By documenting detailed information on the question routing, scales and images used and similar questions and variables across multiple studies, researchers, survey and data managers are all better informed. This then opens up new possibilities for the studies going forward.
In order for CLOSER Discovery to function, it sits atop a giant metadata repository, that has been designed not only to power a search engine but provide additional functionality and automation to the studies themselves. By drawing links between multiple studies, centres and data warehouses CLOSER has begun to tear down the outdated data-silo model, which has led to so many issues inhibiting harmonisation and linkage.
Data collection instrument design can be made faster and more consistent through the act of reusing entire sections of questions previously used and designed in previous studies. By documenting this process from the point of design, harmonisation can be performed more efficiently and effectively. By developing these techniques and tools using eight of the UK’s longitudinal studies, they have been rigorously tested for scalability.
CLOSER’s metadata has a universal identifier for every single item, allowing datasets used and papers published to reference the variables they have used precisely. While questionnaire designers are able to reference the questions they have used, such as standard scales, and clearly documented how they have been altered.
By having much richer and clearer metadata documented, data managers can save enormous amounts of time cleaning data that has been collected before being deposited for analysis. Also, the laborious task of creating and then maintain data dictionaries can easily be automated and standardised.
CLOSER Discovery categorises all variables and questions with topics from CLOSER’s controlled vocabulary. This allows more effective searching and filtering of the huge quantity of content made available. The task of apply topics to questions and variables is hugely time consuming, but CLOSER is working with machine learning to further automate the process, enhancing the metadata without increasing costs.