Is Clean Data Better? |
|
Chair | Dr Frances Barlas (GfK Custom Research ) |
Coordinator 1 | Mr Randall Thomas (GfK Custom Research) |
Many researchers have argued that, to improve accuracy, we should clean our data by excluding from analyses participants who exhibit sub-optimal behaviors, such as speeding or non-differentiation. Some researchers have gone so far as incorporating ‘trap’ questions in their surveys in an attempt to catch such participants. Increasingly, researchers are suggesting aggressive cleaning criteria to identify large portions of respondents for removal and replacement. This not only raises questions about the validity of the survey results, but also has cost implications as replacement sample is often required. For this project, we used data from four different surveys that contained items which allowed us to estimate bias, including items for which external benchmarks existed from reputable sample surveys along with actual election outcomes. Survey 1 had 1,847 participants from GfK’s probability-based KnowledgePanel® and 3,342 participants from non-probability online samples (NPS) in a study of the 2016 Florida presidential primary. Surveys 2 and 3 had over 1,671 participants from KnowledgePanel and 3,311 from non-probability online samples fielded for the general elections in 2014 in Georgia and Illinois. Survey 4 was a 2016 national election study with 2,367 respondents from the KnowledgePanel. We examined how varying the proportion of respondents removed based on increasingly aggressive data cleaning criteria (e.g., speeding) affected bias and external validity of survey estimates.
Across studies, while we found NPS had higher bias than the probability-based KnowledgePanel sample as we have obtained in prior studies, we found that more rigorous case deletion generally did not reduce bias for either sample source, and in some cases higher levels of cleaning increased bias slightly. Some cleaning might not affect data estimates and correlational measures, however, excessive cleaning protocols may actually increase bias, achieving the opposite of the intended effect while increasing the survey costs at the same time.
Many researchers have argued that, to improve accuracy, we should clean our data by excluding cases from analyses when participants have demonstrated sub-optimal behaviors, such as speeding, egregious non-differentiation on grid questions, or leaving many questions unanswered. Some researchers have gone so far as incorporating ‘red herring’ or ‘trap’ questions in their surveys in an attempt to catch such participants. Increasingly, some researchers are suggesting aggressive cleaning criteria that often identify a large portion of respondents for removal and replacement, sometimes up to 20 percent of participants. This not only raises questions about the validity of the survey results, but also has cost implications as replacement sample is often required. While Thomas (2014) showed that eliminating small proportions of such respondents (less than 5%) did not significantly affect survey results, a question arises: How much data cleaning is too much?
For this research, we analyzed data from two different projects that contained items which allowed us to estimate bias, including items for which external benchmarks existed from reputable sample surveys. Project 1 included a probability-based sample of 1,297 participants from GfK’s KnowledgePanel and 2,563 participants from non-probability online samples. This survey included 36 questions with available external benchmarks to investigate bias. Project 2 was the Foundations of Quality (FoQ2) study sponsored by the Advertising Research Foundation, which included 57,104 participants from 17 different non-probability sample providers. This survey included 24 national benchmarks used to investigate bias. We examined how broadening the data cleaning criteria by excluding an increasing number of participants affected bias and external validity of survey estimates.
Across studies, we found that more rigorous case deletion generally did not reduce bias, and in some cases increased bias. Demographic weighting appeared to have more of an improvement for the probability samples than non-probability samples when larger proportions of cases were deleted, however, in all cases, increased levels of deletion increased weight variance, which decreased weighting efficiency and jeopardized analyses on key subgroups. Findings thus far indicate that some cleaning might not affect data estimates and correlational measures, however, excessive cleaning protocols may actually increase bias, achieving the opposite of the intended effect while increasing the survey costs at the same time.