ESRA logo
Tuesday 14th July      Wednesday 15th July      Thursday 16th July      Friday 17th July     




Tuesday 14th July, 11:00 - 12:30 Room: HT-103


Using Paradata to Improve Survey Data Quality 1

Convenor Professor Volker Stocké (University of Kassel, Germany )
Coordinator 1Professor Jochen Mayerl (TU Kaiserslautern, Germany)
Coordinator 2Dr Oliver Lipps (Swiss Centre of Expertise in the Social Sciences (FORS), Lausanne, Switzerland)

Session Details

“Paradata” are measures generated as a by-product of the survey data collection process. Prominent examples of paradata are data available from the sampling frame, call-record data in CATI surveys, keystroke information from CAI, timestamp files, observations of interviewer behavior or respondents’ response latencies (see Kreuter 2013 for an overview). These data can potentially be used to enrich questionnaire responses or to provide additional information about the survey (non-)participation process. In many cases paradata are available at no (or little) additional cost, but the theoretical basis for using paradata as indicator for survey data quality is very underdeveloped. Some examples about the use of paradata are:

Paradata in fieldwork monitoring and nonresponse research: Paradata are often used in the survey management context. With control charts survey practitioners can monitor fieldwork progress and interviewer performance. They are also indispensable in responsive designs as real-time information about fieldwork and survey outcomes which affect costs and errors. However, their role as indicator for interviewer or fieldwork effects, as well as predictors for nonresponse is unclear.

Paradata to understand respondent behavior: Paradata might aid assessing of the quality of survey responses, e.g. by means of response latencies or back-tracking. Research has used paradata to identify uncertainty in the answers given by respondents, e.g., if respondents frequently alter their answers. In this new strand of research, however, indicators might still be confounded and tap into multiple dimensions of the response process (e.g., response latencies may be an indicator for retrieval problems and/or satisficing).

Paper Details

1. An Experimental Comparison Using Paradata and Modeled Paradata To improve Interviewer Performance
Miss Tamara Terry (RTI International)



Monitoring interviewer performance is critical to successful data collection. efforts. This paper will present results from the first experimental evaluation of the use of incorporating model-based paradata along with common management reports to better coach interviewers and improve interviewer performance. Using a large telephone survey, we randomize interviewers into two groups – (1) assessment by commonly used reports only and (2) assessment by combination of commonly used reports and the model-based paradata. The model-based paradata uses case history and other paradata to account for sample differences across interviewers to allow faster and more accurate detection of under-performing


2. Interviewer Effects on Paradata Predictors of Nonresponse
Dr Rachael Walsh (U.S. Census Bureau)
Dr James Dahlhamer (National Center for Health Statistics)

In 2013, the National Health Interview Survey (NHIS) added interviewer observations to the Contact History Instrument (CHI), asking interviewers to record neighborhood and sample unit characteristics hypothesized to predict survey response and key survey estimates. Using data collected from January through June 2014, multilevel, multinomial logistic regression was used to assess interviewer effects on the relationship between these observations and whether a sample unit completes the interview, refuses participation, or is never contacted. This research evaluates these observations as potential indicators of interviewer and fieldwork effects to determine impacts on variance and bias when using the observations to predict nonresponse.



3. Impact of Nonresponse on Survey Estimates of Physical Fitness and Sleep Quality
Dr Linchiat Chang (www.linchiat.com)

Survey estimates of sleep and exercise are used in a wide array of domains from public health to commercial product development. The 2013 U.S. National Health Interview Survey released both health survey data and paradata on the relative contactability of respondents. Analyses revealed significant positive correlation between extent of noncontact and frequency of physical exercise and sleep quality. Predictive models yielded discrepant output when the survey sample is filtered by extent of noncontact, even after controlling for demographics and chronic conditions. The final analysis examines the number of contact attempts needed to stabilize relative effect sizes of key predictors.



4. Using Paradata to Monitor Interviewer Data Quality
Ms Nicole Kirgis (University of Michigan)

Paradata from audit trails, the record of actions and entries within a computerized questionnaire, can be used for data quality monitoring at the interviewer level. Audit trail data include a record of every key stroke and the time spent between key strokes. Indicators include the average time spent on survey questions, resolving error checks, and the frequency of “don’t know” and “refuse” responses. This presentation will discuss the implementation of an interviewer-level data quality dashboard. Examples provided will show how this data monitoring technique has been used to identify and address interviewer data quality concerns.