Post-survey data: curation and technical data management |
|
Coordinator 1 | Dr Aida Sanchez-Galvez (Centre for Longitudinal Studies, UCL Social Research Institute) |
Coordinator 2 | Dr Vilma Agalioti-Sgompou (Centre for Longitudinal Studies, UCL Social Research Institute) |
Post-survey data management processes are a crucial part of the survey lifecycle. These processes ensure the curation, de-identification, documentation, maintenance, and production of high-quality data for research purposes and safe dissemination. Close collaboration with data scientists and other users is needed to understand the requirements of the research community. The data from some surveys may include not just responses to questions but biomedical data and cognitive test results. Data may be collected via multiple modes and in the case of longitudinal surveys will be collected at multiple time points.
Data management setups should ideally be based on robust data management policies, secure data handling standards and well-defined technical protocols. This enables a coherent approach to data quality, efficient workflows, generation of metadata and rapid adaptation to new projects (e.g. Covid-19).
The tools for survey data management are usually written scripts tailored to the survey design. This creates opportunities for programming in statistical software (e.g. SPSS, Stata SAS) or scripting languages (R, Python), as well as long-term storage in databases (e.g. MS SQL Server, PostgreSQL, MySQL, MongoDB).
The aim of this session is to create a space to share information, ideas and techniques on post-survey data curation and management. We invite colleagues with data management responsibilities to submit ideas relating to:
• Technical protocols or bespoke scripts for in-house processing of survey data, paradata, and metadata
• Automated survey data processing and its reproducibility on other survey data
• Data quality assurance and validation techniques and syntax for the identification of data inconsistencies or errors
• Implementation of ETL (extract, transform and load) data workflows
• Techniques to set up surveys from the outset (e.g CAI) to facilitate smooth post-processing
• Use of databases to store and manage data and metadata
• Sharing processing syntax