It’s the Interviewers! New developments in interviewer effects research 3 |
|
Chair | Dr Salima Douhou (City University of London, CCSS ) |
Coordinator 1 | Professor Gabriele Durrant (University of Southampton) |
Coordinator 2 | Dr Olga Maslovskaya (University of Southampton) |
Coordinator 3 | Dr Kathrin Thomas (City University of London, CCSS) |
Coordinator 4 | Mr Joel Williams (TNS BMRB) |
In order to guarantee highly standardized settings in large-scale educational assessments, test administrators are intensively trained. However, in the field some test administrators do not precisely adhere to these testing protocols and deviate from the standardized practice. As a result, different interviewer behaviour can introduce a systematic bias in large-scale assessments of competences. Therefore, it was tested if systematic test administrator effects could be identified for the measurement of mathematic abilities in the adult cohort of the German National Educational Panel Study. Furthermore, the variance in ability estimates introduced by different test administrators was disentangled from the variance attributable to different geographical areas. Due to the use of sampling points, used in absence of a nation-wide population register, greater homogeneity in ability estimates can exist within the regional clusters as compared to homogeneity between clusters.
The sample consists of 5,220 respondents in the age of 24 to 67 years that lived in 466 area clusters and were interviewed by 207 different interviewers. On average, each interviewer administered the competence test to 25.2 test takers and 11.2 persons lived in each cluster. The 21 Items for mathematic competence were administered to respondents in year 2010 and 2011 by paper and pencil mode. The competence tests took place before the computer assisted personal interviews and both were normally held at the respondent’s home.
Because interviewer effects were expected to be statistically confounded with effects at the area level, cross-classified multilevel analyses with the use of Markov Chain Monte Carlo (MCMC) procedures were conducted to disentangle both sources of variance. It was found for the estimation of adult mathematic achievement, that 9.5 percent of the observed variance in mathematical competence is attributable to interviewers and 0.5 percent to geographical areas. Even though the variance in competence measures traceable to interviewer presence was rather high, none of the investigated interviewer characteristics (gender, age, education and work experience) was found to be significantly related to ability measures of respondents. Thus, socio-demographic differences were unable to identify interviewers with aberrant test administration behaviours.
Therefore, Bayes predictions of second level errors were used to identify outlying interviewers. As the considered variables adjust for prior differences among test takers and contexts, the interviewer level residual variances are a measure of bias introduced to competence measurement by the test administrator. Unobserved factors at the interviewer level which affect respondent achievement were investigated this way. Our Bayesian prediction of the second level random effects for mathematic achievement identified interviewers that exhibited significant effects on the respondent’s competence measurement. Out of the 207 interviewer clusters, 6 clusters have an interval below and 9 clusters have an interval above zero. Hence, their estimated competence intercept deviates from the population mean. As the model adjusts for differences between respondent characteristics and contextual factors these significant deviations in predicted intercept values can be interpreted as interviewer bias in competence measurement.
Interviewer effects in face-to-face studies can not only impact on the precision of survey estimates but also impact on the accuracy of analysis conducted at a local level. The Crime Survey for England and Wales (CSEW) is a continuous face-to-face cross sectional study conducted by Kantar Public on behalf of the Office for National Statistics. The study is designed to allow for analysis to be conducted separately for each Police Force Area (of which there are 43 in England and Wales).The geographical clustering of sample points, and the fact that interviewers tend to work near to where they live, can mean that an individual interviewer works a reasonably high proportion of a PFA over the course of a year. There is therefore a risk that interviewer effects may lead to bias in PFA level estimates.
A quantitative approach to identify interviewers who administer the survey instrument in an atypical fashion has been developed for this study. Demographic variables (deemed to be largely unaffected by interviewer effects) are used to model respondent level responses at a set of key questions. This model is then used to predict the distribution of responses that we expect each interviewer to record at these variables, based on the set of respondents they have interviewed. The expected and observed survey estimates are used to calculate T-scores for each interviewer at each variable. Where large discrepancies have been found, interviewers have been contacted to try and identify the root cause of the issue and in order to retrain them.
In this paper we will explain the approach we have used to identify interviewers who are administering questions incorrectly, and report on the impact which our interventions have had by tracking the progress of the interviewers since they were retrained.