ESRA logo

ESRA 2019 glance program


Machine Learning, Passive Data, and Interviewer Performance

Session Organisers Dr Hanyu Sun (Westat)
Dr Ting Yan (Westat)
TimeFriday 19th July, 09:00 - 10:30
Room D17

Abundant research indicates interviewers have substantial effects on different aspects of the data collection process (see West and Blom, 2017 for a review). For instance, Interviewers vary in their abilities to develop sampling frame (e.g. Eckman and O’Muircheartaigh, 2011), to conduct within household respondent selection (e.g. Tourangeau, Kreuter, and Eckman, 2012), to make contact with and gain cooperation from the sample member (e.g. Purdon, Campanelli, and Sturgis, 1999), to deliver the questionnaire (e.g. van der Zouwen, Dijkstra, and Smit, 1991) and to collect paradata (e.g. West and Kreuter, 2015). Because of the great effects interviewers may have on data collection, different methods have been used to monitor interviewer performance. For examples, a sample of respondents are recontacted to detect falsification (e.g. Groves, 2004). Behavior coding is done on the audio recorded interviewer-respondent interactions (e.g. Hicks et al., 2010). Such monitoring is often costly and labor intensive. As a result, only a small sample of respondents are recontacted and a small proportion of audio recording are human coded.

To reduce cost and improve efficiency of interviewer monitoring, innovative techniques and additional data are needed. Some passive data elements (such as GPS collected on interviewers’ smartphones) are used to track interviewers during the field. Although there is little research on how to use of machine learning approach to monitor interviewer performance and to detect falsification, there is a blooming interest in the survey field to explore the use of machine learning and passive data on different aspects of the data collection process. Machine learning has been used to develop sampling frames for non-traditional sampling units (e.g. Eck et al. 2018), analyze open-ended survey responses (e.g. Thompson, Book, and Tamby, 2018), and evaluate data quality (e.g. Wang and Harrison, 2018).

This session will be exploring the use of machine learning methods and passive data to monitor interviewer performance. Researchers are invited to submit papers, experiments, pilots, and other approaches on any of the following topics:
• Detecting interviewer falsification
• Monitoring interviewer performance
• Improving interviewer efficiency
• Optimal routing
• Interventions to improve field efficiency

Keywords: machine learning, passive data, interviewer performance

Using Machine Learning to ‘Triage’ Open Text Data in Order to Increase Processing Efficiency

Ms Catherine Billington (Westat) - Presenting Author
Mr Andrew Jannett (AndrewJannett@westat.com,)

Reviewing text data is an integral but costly component of data quality on many studies. Text fields may be collected in order to assign industry standard codes (e.g., insurance providers, occupations), to allow field interviewers to request updates to case-level data, or for operational purposes (case notes). Processing text strings typically improves data quality, but it comes at a cost. This is particularly time-consuming and costly for field comments where review remains a very manual process. While not possible or desirable to fully automate reviews, it is a priority to find efficiencies without loss of quality. On large projects, even small efficiencies multiplied over time produce real savings.

We conducted a pilot assessment of usability, increased efficiency, and the effect of suggestibility on data quality when using machine-learning techniques to assist in the processing of comment text strings. It is possible that a bias may exist in accepting a programmatically identified solution. The sample included over 4,000 comments from a national longitudinal study. Each comment was previously assigned a code by a data technician to match it with an approved standard procedure used to edit the data. These pairs (comment + procedure code) were used to train programs to scan comment data and recommend a standard editing approach as indicated by a code. New comments were then scanned by the program and assigned codes.

The codes that were recommended programmatically were compared to manually selected codes for the same comment data. In this way, we were able to assess the extent to which using machine learning to scan and ‘triage’ text data can increase processing efficiency while maintaining exacting standards for data quality.


Using GPS to Detect Falsifiers: Some Nuts and Bolts

Ms Victoria Vignare (Westat) - Presenting Author
Ms Marsha Hasson (Westat)
Ms Tammy Cook (Westat)

Falsification has been a longstanding concern in surveys, and the face-to-face mode is especially at risk because the interviewer’s work is performed with only light remote supervision. GPS has emerged as a new tool to detect falsifiers in the field, by matching the interviewer’s location with the location of the respondent’s home at the time the interview took place. This new tool has one major advantage over other forms of detection: it can be applied nearly universally across all interviews, rather than to just a sample. It scales up.

This paper will describe how GPS data are used to support identification of potential falsification at Westat. A mobile or laptop device logs the location of the interviewer throughout the day, including travel and stationary activities. Depending on approach, this information is sent to the home office in near real-time or transmitted by the interviewer following interviews. Immediately upon receipt, the GPS information is combined with case data and various algorithms are run to determine if the interviewer was at the home of the respondent at the time the interview conducted. The supervisor is immediately alerted of any issues and provided with the case data, including any EROCs that might explain a change of location at the respondent’s request. The supervisor begins an investigation that may include a review of the interviewer’s route that day, caseload, records of contact attempts, CARI data, and work history. We provider several examples of these investigations that separate false positives from confirmed falsifiers.

We will also provide some level of effort information, compare the costs and benefits of using GPS versus CARI and re-interview methods as a tool for detecting falsifiers.


Identifying Interviewer Falsification using Speech Recognition: A Proof of Concept Study

Dr Hanyu Sun (Westat) - Presenting Author
Dr Gonzalo Rivero (Westat)
Dr Ting Yan (Westat)

Survey management staff have long used Computer Audio-Recorded Interviewing (CARI) to assess interviewer performance, validate interviews, or evaluating the performance of survey questions. The success of such an evaluation, however, often depends on labor-intensive coding in a timely manner. The coders first need to listen to the interactions between the interviewer and the respondent, and then provide their assessment of how the interviewer performs based on some criteria such as whether or not the interviewer falsified the case and whether or not the interviewer followed the standardized interviewing techniques. Due to resource constraints, typically only a small number of items in the questionnaire or a small portion of the interview will be listened to and coded based on pre-specified criteria. In recent years, there is a blooming interest in the survey field to explore the use of machine learning on different aspects of the data collection process (e.g. Eck et al. 2018, Thompson, Book, and Tamby, 2018). However, little research is conducted to use how the machine learning approach would improve the process of monitoring interviewer performance and detecting falsification. Here we report a proof of concept study that explores the use of speech recognition to detect interviewer falsification. At Westat, we developed an assessment tool that automates the coding and the evaluation process. The tool first transcribes CARI audio recordings into text and then measures the distance between the transcript and the questionnaire. The distance is used to create a score on how likely the interviewer falsified the interview. The tool also detects the number of conversational partners in the interview and uses that to detect falsification. In this presentation, we will show how the tool works using recorded lab interview with varied features. We will also evaluate how the tool performs at detecting falsification


Can Paradata Predict Interviewer Effects?

Mr Sharan Sharma (University of Michigan)
Professor Michael Elliott (University of Michigan) - Presenting Author

Consideration of interviewer effects (interviewer measurement error variance) in active quality control does not seem widespread despite its known effect on reducing precision of survey estimates. One major obstacle is that interviewer effect estimates computed as a survey is in progress can be very unstable due to limited data. We address this issue by exploring the use of paradata (keystrokes and time stamps generated during the computer-assisted interviewing process) as proxies of interviewer effects with a focus on large-scale repeated cross-section or panel surveys.

We first estimate interviewer effects for each item in our analysis by using multilevel models that include a vector of respondent covariates to approximate interpenetration. We then compute the proportion of variance explained when we add interviewer-level paradata inputs to this model. These inputs are selected using adaptive lasso from a pool of thirteen measures. Realistic predictions of the explained variance are then computed using a bootstrap-based method.

We use data from the 2015 wave of the Panel Study of Income Dynamics (PSID) for our analysis and find promising results - paradata explain more than half the magnitude of interviewer effects on average across items. Also, paradata outperformed interviewer-level demographic and work-related variables in explaining interviewer effects. While most of the focus in the literature and practice has been on time-based paradata, e.g., item times, we find that non-time based paradata, e.g., frequency of item revisits, outperform the time-based paradata for a large majority of items. We conclude by discussing how survey organizations can use these findings in active quality control to contain interviewer effects.