ESRA logo

ESRA 2019 glance program


Assessing the Quality of Survey Data 1

Session Organiser Professor Jörg Blasius (University of Bonn)
TimeTuesday 16th July, 11:00 - 12:30
Room D02

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the “substantive” solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings.

Keywords: Quality of data, task simplification, response styles, satisficing

Mixing Negative and Positive Item Wordings in Survey Scales: Impact on Data Quality in Older Populations

Dr Wander van der Vaart (University of Humanistic Studies, Utrecht) - Presenting Author
Dr Tina Glasner (University of Humanistic Studies, Utrecht)

It is a common rule in questionnaire design that a scale for one construct consists of a mix of positively and negatively worded items. Combining both types of wordings provides a more balanced representation of the issues being measured. Furthermore, combining positive and negative items urges respondents to read more carefully, which may reduce acquiescence effects and satisficing behaviour. However, research also indicates that positive and negative items often form separate factors in factor analysis and that positive items result in higher mean scores than their negative counterparts. Furthermore, studies indicate that negatively phrased items are harder to process than positively phrased ones and that alternation of item directionality is burdensome. For respondents with reduced cognitive or motivational abilities, combining negative and positive items may hamper data quality instead of enhancing it. In this line, the current study focuses on a population of older respondents: they are at greater risk of experiencing difficulty in processing negatively worded items and finding alternation of item directionality burdensome.
A split-ballot experiment was performed on inhabitants of Dutch senior residences. For two scales - ‘meaning in life’ and ‘self-reliance’ - respondents randomly obtained one out of two versions: a scale with positive and negative items combined, or a scale with positively phrased items only. Ten senior residences were selected covering different regions in the Netherlands and encompassing a variety in residence size (36 to 335 inhabitants). All 1147 inhabitants received a questionnaire; response rate was 35%, N=405 (age range 46-99).
Analysis of scale reliability and validity demonstrated lower levels of data quality for the balanced positive-negative scale versions. Related cognitive and motivational mechanisms that may underlie response behavior of older respondents are discussed.


How Much Text Is Too Much? Assessing Respondent Attention to Instruction Texts Depending on Text Length

Mr Tobias Rettig (University of Mannheim) - Presenting Author

Whether respondents pay adequate attention to a questionnaire and the stimuli within it, has been a concern for survey researchers for decades. One way of assessing attention is asking respondents for specific answers or actions, known as an instructional manipulation check (IMC). Previous research into this field has largely dealt with the question whether respondents read texts or not, but not with how much text they can be expected to read. I fill this gap in the literature by including an IMC in an online panel survey and systematically varying the length of the surrounding text.
Data stems from the November 2018 wave of the German Internet Panel (GIP), an online panel representative of the German population. About halfway into the questionnaire, respondents are instructed not to answer a specific question, but to continue by clicking the GIP logo instead. This instruction was “hidden” in the question text, the length of which was experimentally varied between four conditions: (1) Only the instruction was displayed, (2) the instruction was placed in one paragraph of text, (3) the instruction was placed in the second of two paragraphs of text, and (4) the instruction was placed in the fourth of four paragraphs of text.
Results indicate that whether respondents will carefully read a text strongly depends on its length. The passing rate for the IMC ranges from about 80% for the shortest to about 40% for the longest text condition. The more text respondents are asked to read, the fewer of them will actually do so. While lower attention from respondents using mobile devices is a commonly voiced concern, I find no evidence to support this.


Using Attention Checks in Mail Questionnaires and Probability-Based Samples: Initial Evidence from a Mixed-Mode Survey Experiment.

Dr Joss Roßmann (GESIS – Leibniz Institute for the Social Sciences) - Presenting Author
Dr Tobias Gummer (GESIS – Leibniz Institute for the Social Sciences)
Mr David Bretschi (GESIS – Leibniz Institute for the Social Sciences)
Mrs Jessica Daikeler (GESIS – Leibniz Institute for the Social Sciences)

Respondents who inattentively respond to survey questions usually provide answers of lower quality compared to respondents who effortfully comprehend, process, and answer the questions. Previous research has argued that inattentive responding may particularly be a challenge for self-administered surveys in which no interviewer is present who can motivate and guide the respondents through the interview. Accordingly, attention checks were proposed to identify inattentive respondents. However, methodological studies on the properties of these tests are sparse and all of the few existing studies have focused on web-based data collection. So far, mail mode has not received any attention in this line of research. Moreover, previous studies have mainly used data from convenience samples and non-probabilistic access panels. Since the attention of the respondents is particularly related to their motivation to participate in the survey, this research gap is particularly unfortunate. Thus, our study aimed at investigating whether attention checks methods can be used in mail questionnaires and probability-based mixed-mode surveys.

We addressed both research gaps by conducting an experiment in a probability-based general population mixed-mode panel survey in Germany (N=4,777). The respondents were randomly assigned to either answering a grid question including an instructed response item attention check (66.4% of the respondents) or receiving the same grid question without the attention check item (33.6%). The experimental design allowed us testing for the effects of receiving an attention check on response behavior with regard to item nonresponse, survey evaluation, and response non-differentiation. Furthermore, our sample enabled us to directly compare web and mail respondents.

In our study, we present the results of our survey experiment and critically discuss the implications of using the method of instructed response item attention checks in mail mode and probability-based mixed-mode surveys. Our contribution closes with recommendations for the implementation of attention checks


New Developments in the Estimation of the Split-Ballot MTMM Experiments

Dr Melanie Revilla (RECSM-Universitat Pompeu fabra) - Presenting Author

A common way of estimating measurement quality is the split-ballot multitrait-multimethod (SB-MTMM) approach (Saris, Satorra, & Coenders, 2004). However, this approach leads often to non-convergence or improper solutions when using a 2-group design, whereas the 3-group design performs better (Revilla & Saris, 2013). Nevertheless, the 3-group design is rarely implemented, because it makes it complicated for applied researchers to use the data. In this presentation, I will briefly present 3 papers which have recently proposed solutions to this situation: 1) Saris and Satorra (2018) proposed a new estimation procedure for the 2-group design called “estimation using pooled data (EUPD)” that can be used for cross-national data; 2) Helm et al. (2018) suggested that the estimation problems of the 2-group SB-MTMM design could be overcome by using Bayesian SEM (BSEM) with minimal informative prior distributions instead of Maximum Likelihood; 3) Revilla, Bosch and Weber (2018) proposed to use a 3-group design but with smaller size for the third group. In that way, estimation problems are limited and at the same time applied researchers do not lose cases due to the fact that respondents get different methods. All three papers use both Monte Carlo simulations and real data analyses to test their approaches, and found improvements compared to the classic 2-group SB-MTMM estimation procedure.


Improving Data-Quality in the GGS: Are Modifications in the Questionnaire and Survey Design in the Generations and Gender Survey Able to Resolve Measurement Errors?

Dr Detlev Lück (Federal Institute for Population Research (BiB)) - Presenting Author
Mrs Almut Schumann (Federal Institute for Population Research (BiB))
Mr Robert Naderi (Federal Institute for Population Research (BiB))
Dr Martin Bujard (Federal Institute for Population Research (BiB))
Dr Tom Emery (Netherlands Interdisciplinary Demographic Institute (NiDi))
Dr Susana Cabaço (Netherlands Interdisciplinary Demographic Institute (NiDi))
Dr Peter Lugtig (University of Utrecht)
Dr Vera Toepoel (University of Utrecht)

The Generations and Gender Survey is an international panel study focussing on family-demography. In the German GGS, conducted in 2005, measurement errors have been identified, regarding the number of children as well as the partnership history. There are cohort-specific biases in terms of over and under-reporting of children. And there is cohort-specific over-reporting of being married and of being never married (Ruckdeschel, Sauer, Naderi 2016). Partly, these biases can be explained by learning effects of both, respondents and interviewers: For example, by under-reporting children time-consuming loops could be avoided that would collect further information on each reported child (idem).

Given that the GGS has a rather complex and long questionnaire and interview lengths of approximately 60 minutes, it is plausible that opportunities to shorten the interview have been tempting to interviewers as well as to interviewees. Accordingly it is likely that these measurement errors could be avoided by improving the questionnaire design. This has been done, and the revised questionnaire is tested in a three-country pilot study in Germany, Croatia and Portugal. This pilot is carried out in 2018 and is still in the field on the day of submission.

The pilot study is not only testing a revised questionnaire but also a potentially revised survey design: In the past, GGS has been conducted in CAPI mode only. The pilot investigates in an experiment whether GGS could move to a sequential mixed-mode design, combining CAWI and CAPI. It also compares several design variations, particularly incentives and the timing of reminder letters. It is likely that such survey design aspects will also affect measurement. Therefore this paper is assessing the data-quality in terms of accurate measurement of the GGS in general. And it compares the CAPI mode used in the past to the sequential mixed-mode variations tested in the pilot.