All time references are in CEST
Agility and the Survey Life-Cycle - If and what survey practitioners can learn from software development 2 |
|
Session Organisers |
Dr Yuri Pettinicchi (SHARE Berlin Institute) Dr Arne Bethmann (TU Munich / SHARE Germany) |
Time | Tuesday 18 July, 14:00 - 15:30 |
Room | U6-06 |
The increasing digitaliation of the survey process has led to many of the tasks surrounding the survey life-cycle resembling software development tasks more and more closely, e.g. programming questionnaires, data processing, automatic preparation of documentation. While survey practitoners and engineers are still often recruited from the social sciences rather than computer science, interest in learning from best practices in software development has increased in recent years. This includes adapting tools e.g. for bug tracking/ticketing or version control of documents and (meta)data, but also management processes like Agile Development.
This session invites anyone in the process of or already having implemented ideas from software development in survey practice to contribute and discuss their experiences. Some of the questions to pose could be, e.g.: Which ideas seem(ed) suitable? Did it work? What were the benefits or drawbacks? How did it need to be adapted in order to deal with the specific requirements of survey development? What tools did you use and why? Does an Agile workflow fit into the survey process traditionally rather organized in a waterfall model? Where are the software and the survey development life-cycle comparable? Where do they differ? How does that relate to the specifics of the survey at hand?
Mr Domingo Scisci (DASSI - Data Archive for Social Science in Italy, University of Milano-Bicocca) - Presenting Author
Ms Giovanna De Santis (Università Politecnica delle Marche)
Recently, several tools and procedures belonging to the software development and programming world have also been applied in other fields and disciplines, opening up new scenarios and possibilities. Starting from the data management experience of the Italian Lives (ITA.LI) household panel survey, we present tools, procedures and skills deployed by the research team throughout the survey life-cycle.
We decided to adopt a data management strategy that can be condensed in the expression "first-code approach”, which consists in focusing attention and work on the code that generates the data as opposed to the data itself, facilitating the activities of documenting the performed operations, in a transparent and collaborative manner.
Consistently with this strategy, the entire data life-cycle from raw to public data has been managed following the typical software development workflow. First, a repository managed by a version control software (Git) was created, that ensured the tracking of all changes and made collaboration easier. The code was hosted in a repository manager (GitLab) that allowed the research team to track issues, bugs and requests collaboratively, linking issues to the code used to resolve them.
This procedure enabled the rapid and frequent release of data to the research team, still in the field phase, following an Agile-like methodology.
Such DevOps-oriented approach also allowed the definition of standardized procedures for setting-up the survey instruments (questionnaires), technical documentation and metadata needed for the long-term preservation.
In conclusion, the strengths and weaknesses of this approach that emerged during the fieldwork will be discussed, particularly highlighting the need for more IT skills in the social sciences and for new professional figures who can integrate technical skills and domain-specific knowledge.
Mr Knut Wenzig (DIW Berlin/SOEP) - Presenting Author
The Socio-Economic Panel (SOEP) relies heavily on a GitLab server for its data management and documentation needs. GitLab is a web-based Git repository manager that provides version control for source code, project management tools, and continuous integration. At the SOEP, the GitLab server has been configured to provide several key features, including:
- Version control: GitLab provides a centralised repository for storing and managing source code, making it easier to track changes and collaborate with other team members. The SOEP uses this feature to manage scripts (Stata and R) and metadata.
- Issue tracking: GitLab's built-in issue tracker allows users to report and track bugs, defects, and other issues. At the SOEP, the issue tracker is used to report problems during data preparation, document issues, and track bug reports from data users. It is also used to manage teams or projects, complex internal processes with multiple stakeholders, and to organise weekly meetings.
- Service desks: GitLab's service desks feature enables users to create and manage support tickets. This feature can be used to centralise and distribute user requests to the relevant specialists, replacing email battles within the team.
- Pipeline: GitLab's pipeline feature allows the SOEP to automate the testing and deployment of metadata stored in CSV files. This feature has also been used to produce almost publication-ready PDF files.
- Wiki: GitLab's built-in wiki feature allows users to create and edit pages of content within the GitLab interface. The SOEP uses this feature to create internal documentation related to its data management activities.
Overall, the GitLab server at the SOEP provides a range of features that support the organisation's research and data management needs. By using GitLab, the SOEP has been able to improve collaboration and streamline workflows.
Mr Oliver Hopt (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Dr Lydia Repke (GESIS - Leibniz Institute for the Social Sciences)
Many methods and good practices have been created in software development and information management that can probably be transferred to other research areas, such as the social sciences. This talk will focus on two main methods: unit testing and continuous integration.
First, unit testing is a well-established mechanism for ensuring software reliability. Good test coverage for a given source code can indicate the correctness of implementations against predefined expectations, reducing the need for manual testing. An example of a social science equivalent would be estimating the quality of a given survey question. The Survey Quality Predictor system, developed by Willem Saris et al. and now running at GESIS, is the first attempt at such estimation. This tool can predict different quality measures based on the formal and linguistic characteristics of survey items (e.g., the properties of the answer scale).
Second, the principle of continuous integration, which is based on good test coverage, has become established in software development over the last decade, especially in agile work environments. The basic idea is that any software system enhancement can be deployed instantly once it has passed all required tests in full automation. The principle is also applicable to the development of social science questionnaires. Using development tools that allow exporting to various formats for survey conduction, it is possible to make spontaneous changes within the pretest or field phase.
We will present the original ideas of these two methods and their application in software development. We will also give an inside into the data needed for their transfer to the social sciences. Finally, we will give an outlook on improvements in the applications shown and the need to change the workflows in questionnaire development.
Mr Sebastiaan Pennings (Centerdata) - Presenting Author
Managing the survey lifecycle of large international surveys has been and will continue to be a complicated puzzle consisting of many countries and organizations working together towards the common goal of fielding a questionnaire and collecting data. Because questionnaires and accompanying tooling need to take many different languages into consideration, management of source definitions and the evaluation of language specific implementations thereof becomes vitally important.
Centerdata is involved in several international surveys, including SHARE, ESS, EVS, GGP and Coordinate. From this experience we developed the DataCTRL suite, a collection of tools that support various parts of the survey lifecycle from centralized perspective.
In this presentation we will take a look at TranslationCTRL, one tool in the DataCTRL suite focused on the management of the translation process, discuss its role within the survey lifecycle and take a look at a form of translation work evaluation we call “sanity checking”. We will show how it is used within the Survey of Health, Ageing and Retirement in Europe (SHARE) to evaluate the completeness and quality of translation, especially with regards to tool- and programmed-questionnaire-specific considerations. We will show how this can help preemptively find issues, improve translation quality overall and detect errors that may impact programmed questionnaires and related accompanying software.