Novel methods: Using machine-learning to aid survey administration |
|
Session Organiser |
Mr Peter Lugtig (Utrecht University) |
Time | Friday 9 July, 15:00 - 16:30 |
test test test test test test
test test test test test test
test test test test test test
test test test test test test
test test test test test test
Dr Jonathan Burton (ISER, University of Essex) - Presenting Author
Professor Michaela Benzeval (ISER, University of Essex)
Professor Meena Kumari (ISER, University of Essex)
In the 12th wave of the Understanding Society Innovation Panel (IP12) we experimented with the collection of bio-markers from sample members. The sample were randomly allocated to three groups: (1) nurses carried out the interview and collected bio-markers; (2) social interviewers carried out the interview and collected a sub-set of bio-markers, and asked the participants to collect and return hair and dried blood spot samples; and (3) where after the online interview participants were sent the kit through the post which enabled them to take and return their hair and dried blood spot samples. Embedded within the study were two other experiments: (1) different ways to encourage people to take their own blood-pressure before the interview; (2) whether promising feedback of blood results increased take-up.
This presentation describes the design of IP12 and gives the results of the experiments, looking at response rates and and take-up of the biological measures, and the potential for response bias. We also assess the quality of the samples collected across the different groups, and the effect of feedback on response.
Mr Qixiang Fang (Utrecht University) - Presenting Author
Dr Dong Nguyen (Utrecht University)
Dr Daniel Oberski (Utrecht University)
It is well-established in survey research that textual features of survey questions can influence responses. For instance, question length, question comprehensibility and the type of rating scales often play a role in how respondents choose their answers. Prior research, typically via controlled experiments with human participants, has resulted in many useful findings and guidelines for survey question design. Nevertheless, there is room for methodological innovation. In particular, it remains a challenge to build prediction models of survey responses that can properly incorporate survey questions as predictors. This is an important task because such models would allow survey researchers to learn in advance how responses may vary due to nuanced and specific textual changes in survey questions. In this way, the models can guide researchers towards better survey question design. Furthermore, because of the use of survey questions as additional predictors, such models will likely improve their prediction of survey responses. This can benefit aspects of survey planning like sample size estimation.
To meet this challenge, we propose to leverage sentence embedding techniques from the field of natural language processing. Sentence embedding techniques map sequences of words to vectors of real numbers, namely, sentence embeddings, which previous research has shown to contain both syntactic and semantic information about the original texts and even certain common-sense knowledge. This suggests that with such techniques, survey questions can be transformed into meaningful numerical representations, which offers two promising solutions. First, given that survey questions as sentence embeddings can readily serve as input for any statistical and machine learning model, we can incorporate sentence embeddings as additional predictors and hopefully achieve more powerful prediction models. Second, we can now manipulate any textual features of survey questions, obtain the corresponding new sentence embeddings as input for prediction models, observe how the responses estimated by the prediction models change are thus able to make informed adjustments to the questions.
Our study investigates the feasibility of these two solutions. We borrow the survey questions and the individual responses from the European Social Survey (wave 9). First, by employing BERT (Bidirectional Encoder Representations from Transformers), a technique successfully applied in many other research contexts, we transform the survey questions into sentence embeddings and train models to predict responses to (unseen) survey questions. Our preliminary results show that the use of sentence embeddings substantially improves prediction of survey responses (compared to baselines), suggesting that sentence embeddings do encode some relevant information about survey questions. Second, we manipulate various aspects of survey questions, such as the topic words, choice of vocabulary and the type of rating scales and thus artificially generate many variants of the original questions. Then, we feed these the sentence embeddings of these generated questions into a high-performance prediction model and examine whether the sizes and directions of the changes in the predicted responses are consistent with hypotheses, established experimental findings and labelled data. This also allows us to determine what kind of information about survey questions sentence embeddings actually encode.
Mr Goran Ilic (Utrecht University)
Mr Peter Lugtig (Utrecht University) - Presenting Author
Professor Barry schouten (Statistics Netherlands)
Mr Seyyit Hocuk (CentERdata)
Mr Joris Mulder (CentERdata)
Mr Maarten Streefkerk (CentERdata)
In general population housing surveys, respondents may be requested to give descriptions of their indoor and outdoor housing conditions. Such conditions may concern the general state of the dwelling, insulation measures the household has implemented to reduce energy use, the setup of their garden, the use of solar panels and the floor area. Part of the desired information may be burdensome to provide or may be non-central to the average respondent. Consequently, data quality may be low or sampled households/persons may decide not to participate at all. In some housing surveys, households are asked to give permission to a housing expert to make a brief inspection and evaluation. Response rates to such face-to-face inspections, typically, are low.
An alternative to answering questions may be to ask respondents to take pictures of parts of their dwelling and/or outdoor area. This option may reduce some burden and may improve data richness, but, obviously, may also be considered intrusive.
In this paper, we present the results of an experiment in which a sample of households from the Dutch LISS panel was allocated to one of three conditions: only text answers, only photos or a choice between text answers and photos.
Respondents were asked to provide information on three parts of their house: their heating system, their garden, and their favorite spot in the house. In this presentation we focus on two key aspects of survey error that vary across our experimental conditions:
1) selection error. We study which respondents are likely to participate in a survey, and which respondents answer to the picture questions, and study that happens to coverage and nonresponse error when we give respondents a choice to take a picture or answer questions.
2) measurement error. The picture data provide much more contextual information about someones housing conditions than a text answer. However, meaningful information from pictures still has to be extracted using image recognition methods.
Finally, we evaluate the combined effect of selection and measurement errors, and the tradeoff between both. In which condition do we learn most about people's housing conditions?