Split Questionnaire Design in Social Research: Evaluation and Application |
|
Session Organiser |
Mr Julian B. Axenfeld (University of Mannheim) |
Time | Friday 16 July, 16:45 - 18:00 |
Long surveys are a problem to data quality, as they lead to decreasing response rates and increasing measurement error. Especially considering high break-off rates in increasingly widespread online surveys and declining response rates, survey researchers may be confronted with the need to reduce the number of questions on more and more occasions. Split Questionnaire Design (SQD) is a tool that allows shortening surveys without dropping questions entirely. In this process, varying components of the complete questionnaire are administered to the respondents, leaving out some questions for each respondent. Resulting missing values can be imputed subsequently to complete the data matrix.
This session presents findings from current research on SQD, building a bridge from statistics to survey practice. It shows new developments on how to implement SQD, its effects on respondents and on the imputation of missing data. Thereby, it will facilitate high data quality in future surveys applying SQD.
Keywords: missing data, respondent burden, nonresponse
Dr Florian Meinfelder (University of Bamberg) - Presenting Author
Mrs Sara Bahrami (University of Eindhoven)
Response burden is a ubiquitous problem in surveys. Questionnaire designers have to think carefully about the trade-off between putting all relevant questions in the questionnaire on the one hand and nonresponse or measurement errors (caused by lengthy interviews) on the other. Split questionnaire survey designs (SQDs) aim to resolve this issue, since we assign only a subset of the total number of questions to each respondent. The method creates a missing data pattern, and sophisticated methods for dealing with missing data, such as multiple imputation, are used to allow for subsequent analysis of the incomplete data. One way to apply an SQD is to create different question blocks, and to assign only a subset of these blocks to each respondent. We refer to this method as block design. Ideally, the reduction of interview time is substantial, while the reduction of information in the data is not, which means in this case that correlations between variables based on questions from different blocks should, on average, be higher than correlations between variables based on questions from the same block. If the total number of blocks is, say, four and two (randomly selected) out of four are assigned, this yields six different questionnaires, which is even feasible for a pen & paper based questionnaire. If the interview is computer assisted the block design is not necessary and we gain flexibility in terms of finding a solution. We propose a new method based on a genetic algorithm that is trained to find an optimal design, i.e. minimum loss of information for a fixed amount of missing data. One benefit of the method over previous suggestions is that a purely mathematical optimum can be easily enhanced by contextual constraints, e.g. questions that cannot be separated or subtopics of the questionnaire that should have a minimum number of items. We investigate the method within a Monte Carlo simulation study, where we compare the loss of information to a hypothetical situation without splitting, and to a situation with splitting the data, but using the complete cases only.
Mr Julian B. Axenfeld (University of Mannheim) - Presenting Author
Professor Annelies G. Blom (University of Mannheim)
Dr Christian Bruch (GESIS)
Professor Christof Wolf (GESIS)
Staggering fieldwork costs make established face-to-face survey projects increasingly consider switching to an online mode of data collection. However, online survey questionnaires usually need to be considerably shorter than those of face-to-face surveys due to more breakoff and lower response rates. Split Questionnaire Designs (SQD), which allow to reduce survey length without dropping items completely from the survey, could be a solution to this issue. SQD entails that the questionnaire is divided into a pre-determined number of modules and a random subset of these modules to be assigned to each participant. Thereby, a pattern of observed and missing data is generated that can be completed through imputation subsequently.
However, the reality of social survey data inflicts unique challenges on the imputation of data obtained with an SQD: Correlations are typically relatively weak while there is much data to be imputed. Hence, in order to support good data quality in face of such adverse conditions, it may be especially important to exploit the correlation structure of the data by constructing modules that separate highly correlated variables. Meanwhile, exact correlations are often not known before data collection starts, so we must rely on some heuristics. Since questions from the same topic may often be correlated, one may consider constructing modules containing questions of diverse topics. In contrast, modules with questions consisting of the same topic, although intuitive from a questionnaire design perspective and previously often implemented, might be far from optimal because highly correlated questions will tend to be in the same module.
To promote data quality in future implementations of SQD, we need to close some knowledge gaps in previous research: How well does imputation perform with SQDs under real-data conditions? What role does the module construction strategy play for the imputation?
In this talk, we will present findings from Monte Carlo simulation studies based on real survey data from the German Internet Panel. We show how different module construction strategies (randomly created modules, single topic modules and diverse topics modules) affect the estimation of univariate frequencies and bivariate correlations after imputation. We will also provide first findings on the impact of different decisions regarding the imputation of the missing data. Current results show that the imputation tendentially introduces small biases in univariate frequencies, but larger biases in correlations. Further, diverse topics modules perform similar compared to randomly created modules. Meanwhile, biases generally tend to be more pronounced with single topic modules.
Mr Alexandre Pollien (FORS) - Presenting Author
Dr Michael Ochsner (FORS)
Dr Michèle Ernst-Stähli (FORS)
Moving from a one-hour face-to-face survey to a push-to-web survey might raise issues of comparison as respondents are not contacted by the same way and do not expect to answer a one-hour questionnaire on the web. This presentation focusses on two concerns resulting from:
- The move from face-to-face to web mode comes with change of the structure of participation due to a change of the mode of contact and a modification of the length of the questionnaire.
- Web surveys should be considerably shorter. But splitting a questionnaire affects the context within the questions, as not all questions of the face-to-face survey can be asked. The meaning of questions, as well as the ideas and the standards of comparison respondents consider answering them are all influenced by questions previously answered in the questionnaire.
In this presentation, we investigate the implications of splitting questionnaire using the European Values Study 2017 as an example. We address the issues of representativeness and measurements related to the transition from a face-to-face survey to a push-to-web survey. The presentation looks at effects of splitting questionnaire across different designs of the EVS experiment, including two long versions of the same questionnaire with variations in question ordering and a matrix design, where questionnaire is split in two parts. We first undergo a global assessment of the effect of the questionnaire design on the response and the mood of the respondent, his/her interest in the actual survey and his/her propensity to respond to a next survey (follow-up module). We then analyse how changing a survey has consequences on how questions are answered. From the point of view of measurement, we focus on issues of questions context and order, and battery splitting.