Machine Learning, Passive Data, and Interviewer Performance |
|
Coordinator 1 | Dr Hanyu Sun (Westat) |
Coordinator 2 | Dr Ting Yan (Westat) |
Abundant research indicates interviewers have substantial effects on different aspects of the data collection process (see West and Blom, 2017 for a review). For instance, Interviewers vary in their abilities to develop sampling frame (e.g. Eckman and O’Muircheartaigh, 2011), to conduct within household respondent selection (e.g. Tourangeau, Kreuter, and Eckman, 2012), to make contact with and gain cooperation from the sample member (e.g. Purdon, Campanelli, and Sturgis, 1999), to deliver the questionnaire (e.g. van der Zouwen, Dijkstra, and Smit, 1991) and to collect paradata (e.g. West and Kreuter, 2015). Because of the great effects interviewers may have on data collection, different methods have been used to monitor interviewer performance. For examples, a sample of respondents are recontacted to detect falsification (e.g. Groves, 2004). Behavior coding is done on the audio recorded interviewer-respondent interactions (e.g. Hicks et al., 2010). Such monitoring is often costly and labor intensive. As a result, only a small sample of respondents are recontacted and a small proportion of audio recording are human coded.
To reduce cost and improve efficiency of interviewer monitoring, innovative techniques and additional data are needed. Some passive data elements (such as GPS collected on interviewers’ smartphones) are used to track interviewers during the field. Although there is little research on how to use of machine learning approach to monitor interviewer performance and to detect falsification, there is a blooming interest in the survey field to explore the use of machine learning and passive data on different aspects of the data collection process. Machine learning has been used to develop sampling frames for non-traditional sampling units (e.g. Eck et al. 2018), analyze open-ended survey responses (e.g. Thompson, Book, and Tamby, 2018), and evaluate data quality (e.g. Wang and Harrison, 2018).
This session will be exploring the use of machine learning methods and passive data to monitor interviewer performance. Researchers are invited to submit papers, experiments, pilots, and other approaches on any of the following topics:
• Detecting interviewer falsification
• Monitoring interviewer performance
• Improving interviewer efficiency
• Optimal routing
• Interventions to improve field efficiency