Using paradata to assess and improve survey data quality 2 |
|
Chair | Dr Caroline Vandenplas (KULeuven ) |
Coordinator 1 | Professor Geert Loosveldt (KULeuven) |
Coordinator 2 | Dr Koen Beullens (KULeuven) |
Previous research shows that interviewers strongly affect the speed of face-to-face survey interviews, even if they are expected to adhere to the principles of standardized interviewing. Interview speed has not only been shown to differ between interviewers but also been shown to vary (increase) as the fieldwork progresses. After changes in the composition of respondents have been accounted for, increasing interview speed can be attributed to interviewers’ behaviour changing over the course of the fieldwork period as they become more familiar with the survey instrument. Differences between interviewers in interview speed and changes in speed over the fieldwork are particularly worrying given that interview speed can affect data quality.
In this study we attempt to replicate and extent previous research’s findings on the effects of interviewers’ past survey experience (little experienced, highly experienced, highly experienced with ESS experience) and experience accumulated over the course of the fieldwork using data from the fifth, sixth and seventh round of the European Social Survey in Belgium. Our initial results confirm that interview speed increases, but increases less rapidly, as interviewers conduct more interviews. We also find some evidence supporting the hypothesis that (ESS-)experienced interviewers conduct interviews at a higher speed. The observed differences in interview speed between interviewers with different experience levels nonetheless appear to weaken in later rounds of the ESS. Contrary to previous research, we do not find evidence for differential interview order effects for (ESS )experienced interviewers compared to little experienced interviewers. We also analyzed the impact on interview speed of the time that has passed since the interviewer’s previous administration of the instrument. We find a consistent, although weak, negative association between interview speed and time since the last interview. Such a negative association would suggest that discontinuities in the fieldwork may prevent interviewers from effectively accumulating the necessary familiarity with the survey instrument to conduct interviews smoothly.
In this paper we analyse the time it takes respondents in the German Wealth Survey “Panel on Household Finance – PHF” to complete interviews. The survey is an interviewer mediated CAPI interview, which consists of a household interview of every household through a financial knowledgeable internal (or external) person and a personal interview of every household member 16+. In 2014, 4,461 households and 8,349 persons took part in the survey, about half of the survey sample is a panel interviewed for the second time. We investigate the role of socio-demographics like gender and age in determining interview length and interview time per question, controlling for other household characteristics and aspects of the survey process, ie. proxy vs personal interview, interviewer characteristics, time of day and weekday of the interview. We also look at whether familiarity with a subject reduces interview time, by analysing the response times for particular sections of the questionnaire (eg. pensions, consumption, income or stock market participation). Our preliminary results show that interview time increases with age, as expected. The gender of the respondent does not appear to influence interview length, when controlling for age and other characteristics. We are not able to look at how different modes of interviewing and different question formats and scales influence response time. This is an interesting topic for future research.
Paradata, data about the survey process, is often considered as some technical information accompanying the survey (day and time of call in CATI, response timing, location of respondents in household surveys, etc.). At the same time, these variables are valuable not in themselves, but in correlation with the content of research. For example, when we see the deviation in timing of answers to particular questions, we can focus on these questions and discover new problem - much deeper than the problem of wording or communication between respondent and interviewer. This is a problem of, as M. Schober and F. Conrad name it, “complicated mappings”. Thus, the quantitative data (paradata) help us to detect complicated situations for subsequent qualitative analysis, repair or even new research.
For example, in one of our phone surveys there was a short closed question – “How do you generally evaluate the quality of your professional education – as low, middle or high?”. Respondents were supposed to choose the answer easy and fast, however the timing variable showed us large dispersion for this question and, consequently, estimation uncertainty. After listening to the recorded interviews we found out: respondents put quite different meanings in “quality of education”, so, they, in fact, answered different questions; respondents had different evaluation scale according to their understanding of “quality of education”; these evaluation uncertainty was due to the changes in Russian education system over several decades which respondents naturally faced. Thus, paradata brought us to the problem of complicated correspondence between research task and circumstances of the respondents.
It might be that data about the survey process – paradata - include not only technical information, but also the way answers of respondents come to be. As we see it in our previous example, constructing of the answers depends on background, life experience of respondents and historical (social, political, etc.) conditions of the country they live in. Thus, rethinking of decision making by respondents as a part of survey process will deepen the meaning of paradata and expand its use. But this problem requires further discussion.
Meanwhile we would say that statistical deviation, exclusion in paradata indicates problematic cases that should be examined more closely both quantitative and qualitative methods and this will lead to data quality improvement. But at the same time, consistent paradata in some survey might tell us even more about coherence of public opinion then just poor statistics. Why? Because it allows us to answer not only the question “what do people think”, but also “how do they think what they think”.
When survey respondents answer sensitive questions, they can sometimes present themselves in a more favorable light than is accurate. The current study examines how respondents produce "socially desirable" answers, testing three hypotheses about response latencies as paradata that may produce reliable indicators of socially desirable responding. These hypotheses are lent plausibility by evidence from prior studies of speech paradata in surveys about nonsensitive facts and behaviors, which have demonstrated that, for example, telephone respondents to an automated speech system delay longer when circumstances they are answering about do not correspond with question concepts straightforwardly (Ehlen et al., 2007), and that telephone respondents are more likely to produce disfluent answers in answers that turn out to be unreliable (Schober et al., 2012).
This study compares response latencies of answers to sensitive and non-sensitive questions in a corpus of 319 audiorecorded mobile telephone interviews from Schober et al. (2015), in which participants answered 32 questions (some sensitive and some not) on their iPhones. This allowed testing whether laboratory findings in controlled settings extend to settings where respondents might be mobile, multi-tasking, or distracted. Half the respondents (160) were interviewed by professional interviewers and the other half (159) by an automated spoken dialog interviewing system (speech-IVR). The main comparisons are (a) within-subject comparison of response latencies by the same person to sensitive and non-sensitive questions; (b) between-subject comparisons of response latencies for selecting more and less socially desirable answers to the same sensitive questions; and (c) within-question comparisons of response latencies when sensitive questions are asked by an automated system or human interviewer. Consistent with evidence in other survey modes, respondents produced more socially desirable responses with a human than an automated interviewer.
The findings give clear evidence from matched (fair) comparisons across multiple questions that mobile survey respondents hesitate more when answering nonsensitive (vs. sensitive) questions; in answers to sensitive questions that are stigmatized (less socially desirable responses), for at least some survey questions; and when interviewed by an automated system (vs. a human interviewer). More generally, the findings demonstrate that speech paradata are indeed significantly associated with sensitivity of both questions and answers in mobile surveys.
These findings help distinguish the mechanisms underlying socially desirable responding—for example, distinguishing between the time pressure that results from talking at all vs. the pressure that results from having a potentially judgmental human interlocutor. They also raise important further questions about the response mechanisms at play: whether speed reflects less thoughtfulness or conscientiousness when respondents are offended or embarrassed by a topic of questioning, whether respondents want to minimize time spent or effort on embarrassing questions, or whether respondents find particular answers more salient or easily available for sensitive questions.