Online probing: Cognitive interviewing techniques in online surveys and online pretesting 1 |
|
Chair | Dr Katharina Meitinger (GESIS Leibniz Institute for the Social Sciences ) |
Coordinator 1 | Dr Dorothée Behr (GESIS Leibniz Institute for the Social Sciences) |
Coordinator 2 | Dr Lars Kaczmirek (GESIS Leibniz Institute for the Social Sciences) |
Online probing is an innovative approach that applies the verbal probing technique used in cognitive interviewing in web surveys. The implementation of this technique within web surveys offers respondents a higher level of anonymity of their answers in comparison to the laboratory situation of cognitive interviewing. It can easily realize large samples sizes in several countries which enable an evaluation of the prevalence of themes, and can explain the response patterns of specific subpopulations (Braun et al. 2015). However, due to the absence of an interviewer, responses in online probing show a lower response quality than in cognitive interviewing that needs to be compensated by a larger sample size (Meitinger & Behr 2016). But which sample size is sufficient in order a) to arrive at a saturation of results, b) to judge the prevalence of a specific theme, and 3) draw cross-national comparisons? Previous research with regard to sample size in cognitive interviewing already indicates that serious problems that were not detected in small samples were consistently found in larger samples (Blair & Conrad 2011). The same probably holds true for online probing. However, the focus of online probing is in most cases on an assessment of validity and, therefore, it does not directly aim for an error but a theme detection.
Following the approach of Blair & Conrad (2011), the presentation tries to close a research gap by assessing the relationship of sample size and theme detection in online probing. For this purpose, we will draw for different sample sizes repeated random samples and compare the average scores obtained for the number of themes mentioned by respondents. We use data from a web survey conducted in May 2014 with 2,685 respondents from Germany, Great Britain, Mexico, Spain, and the U.S. The respondents came from a nonprobability online panel and were selected by quota for gender, age, and education.
In particular, we want to answer in this presentation the following research questions: 1) Which sample size is necessary to detect the total number of identifiable themes? 2) Which sample size is necessary to detect major themes and with which samples size it is also possible to detect themes which occur less frequently? 3) Can we find, also with small sample sizes, problematic themes that indicate that respondents misunderstood the item? 4) Can we observe cross-national differences with regard to necessary sample sizes? 5) Which sample sizes are necessary to capture cross-national differences in prevalent themes adequately?
Cognitive psychology research has shown that words less commonly used in daily speech are recognised and processed more slowly than more commonly used words (Howes and Solomon 1951; Broadbent 1967). Thus, using terms that are less familiar to respondents, can lead to comprehensibility problems and decreases response quality. Those can be detected with different pre-testing methods, in particular using cognitive interviewing.
As part of a larger project that compared different survey pre-testing methods on how effective they are in detecting less familiar words that decrease question comprehensibility, we also conducted a small experiment using online probes. Two versions of the same set of 13 questions were compared in online split-ballot cognitive interviews that used the paraphrasing and definition techniques to study how respondents understand wording alternatives with the same meaning but with different comprehensiblity levels. Using the crowdsourcing platform Prolific Academic we recruited 120 respondents that were randomly allocated either to either the version with less common terms or the version with more common terms based on their frequencies in text corpora. In both versions we asked the respondent either to paraphrase the question or to define a certain word in a question, depending on the item.
In most cases, when presented with a less common wording, respondents used its more common alternative to define it or to paraphrase the whole item. Another finding is that there was much more variation in participants’ answers in certain cases of more common wordings, which indicates that these wording alternatives might have less clear and more ambiguous meanings. Finally, we compare the results with two other question evaluation methods, exper reviews and an online panel of 2,966 participants where the sample was split into four groups that responded to four different versions of the same items that differed in their comprehensibility level defined by the amount of less common words.
Judgements about the likelihood of future events are an important input for predictions and decisions by citizens, policy makers, and researchers alike. From the early 1990s on, surveys have increasingly measured the subjective expectations of individuals on a 0-100 scale of chance. In the field of Economics, for example, these measures have become widely employed in models, analyses, and predictions of individual and household decisions under uncertainty and have been proven effective tools for documenting and quantifying mechanisms that cannot be unambiguously inferred from choice data alone and for explaining empirical evidence at odds with predictions of economic theory.
A widely measured expectation is subjective life expectancy (henceforth SLE), whereby respondents are asked the percent chance that they will live to be a certain target age or older. SLE has received particular attention among expectations, as it is a critical measure for social insurance programs, private insurance and annuity contracts, consumption and saving decisions, and so on. Because the end date of one’s life is unknown, individuals’ life-cycle decisions, insurance contracts, etc. are based on some prediction of how long individuals are going to live.
While the importance of SLE is uncontroversial, survey questions eliciting probabilistic SLE using a numerical scale of chance likely demand high levels of cognitive --and possibly emotional-- effort from respondents. First, respondents are asked to make a prediction about an event which often refers to a somewhat distant future (depending on the age of the respondent at the time of the survey and the target age used by the question). Second, respondents are asked to think probabilistically. Third, the topic of the question can be perceived as emotional by some respondents. In terms of the survey response process model, major difficulties with these questions are likely to arise in the memory retrieval and judgment steps.
In this project we aim to better understand the cognitive process of respondents when answering probabilistic SLE questions. To this end, we use a probing follow-up which we included in three surveys: 1) a non-probability sample Web survey conducted across the U.S., the U.K., Germany, Spain, and Mexico; 2) a non-probability sample Web survey in the U.S.; and 3) a probability sample telephone survey in the U.S. Specifically, right after answering the SLE question, respondents received an open-ended question asking them to indicate how they arrived at their answer about the SLE. Here we analyze qualitative data from the probing question to examine respondents’ cognitive processes and their sensitivity to respondents’ cultural backgrounds (proxied by respondents’ country/language), as well as to methodological features such as mode.