Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Web data collection for probability-based general population surveys 2 |
|
Convenor | Professor Peter Lynn (University of Essex) |
Coordinator 1 | Ms Lisa Calderwood (Institute of Education, University of London) |
Coordinator 2 | Ms Gerry Nicolaas (NatCen Social Research) |
Web survey methodology is well-established for non-probability online panels and for specialist populations where web access can be assumed to be universal and where an available sampling frame includes email addresses. However, for probability-based general population surveys experience of web data collection remains limited. Many such surveys are now considering the inclusion of web within a mixed-mode design, though few have yet incorporated a web element and there is no consensus on the best way to do this. Meanwhile there is a very small but growing number of single-mode probability-based online panels, using different methodologies.
Survey researchers anticipate several potential benefits from the use of web data collection for general population surveys, such as reduced data collection costs and faster data collection. However, there are significant challenges to be overcome. Major themes include:
Sampling and coverage: How can we design surveys, incorporating web, so that they meet the representativity requirements of general population surveys?
Participation and engagement: What must we learn and do to engage participants in web surveys so that we get high, unbiased response and good quality data? In particular, how do we engage with sub-groups crucial to the success of social surveys - those with poorer access to technology and lower skills, those from disadvantaged and minority groups, etc?
Measurement challenges: How can we best capture complex data using the web and what new opportunities (and related research implications) are there for us to capture new kinds of data?
We welcome submissions to this session that address any of the issues faced by probability-based general population surveys with respect to the inclusion of a web-based data collection element. We particularly welcome reports of findings from experimental or developmental work. We also welcome case studies of general population surveys that have added a web element.
3rd co-ordniator Dr. Caroline Roberts, University of Lausanne, caroline.roberts@unil.ch
Introduction
As internet use has spread among the population, there has been dramatic growth in online survey research. Volunteer web panels are now widely used for market research/opinion polling, but less for academic/government research due to concerns about representativeness. Various methods attempt to make web panels more "representative" of the population. We compared results from four UK web panels with a national probability survey.
Methods
A shortened Natsal3 questionnaire was included on four web panels: two used standard demographic quotas, and two were 'modified' using variables correlated with key outcomes as additional quotas. After weighting for age and sex, comparisons were made with Natsal3 (CAPI and CASI) for demographic characteristics, key behaviours and attitudes, to examine whether modified quotas 'improved' the results.
Results
Each web panel gave significantly different results from Natsal3 on a majority of the variables. There were more differences among men than women for all the web panels. There were more differences between the web panels and Natsal3 questions asked in CAPI than in CASI. The web panels also differed significantly from each other. One modified quota panel produced estimates closer to Natsal3 than the standard quota panels, but still differed on three-fifths of the variables. Moreover, modified quotas were difficult to meet and had to be relaxed.
Conclusions
When measuring sensitive behaviours in the UK population, volunteer web panels provided significantly different estimates than a probability CAPI/CASI survey. Modifying web panel quotas did not lead to much improvement.
(5th coauthor: Fred Conrad)
With the general response rate decrease in surveys mobilizing traditional collection methods, internet seems to be a potential alternative for general population surveys. Potential strengths of collecting via the Internet are well known.
But getting a representative sample of the general population by an Internet survey is a challenge. First because of a lack of coverage: in 2011, more than 25% of households didn't have Internet in France. Another cause is the lack of online sampling frame that doesn't allow random selection of participants directly on the Internet, a precondition of the representativeness of the sample.
But, commercial or opinion surveys using volunteers of online panels recruited by the pollsters are numerous. What is about their quality? Can we rely on studies on such populations when it comes to public health?
We replicated on an online volunteers panel of a pollster, a national survey on sexual and reproductive health made a year earlier by telephone in the general population via a digital random dialing process. Both samples including 8000 people aged 16 to 49. The mean duration time of the questionnaire was around 41 minutes on both media.
This presentation describes the online survey methodology (sampling and follow-up), compares the structures of the telephone and Internet samples, and finally assesses the discrepancy of prevalences and of the determinants for some key indicators depending on the type of survey. Possible use of the Internet as well as a panel of pollster for a SRH survey is finally discussed.
In order to obtain unbiased estimates from survey interviews, it is important that the data is representative. Using administrative records and survey data, the main questions we address concern nonresponse bias in web surveys. In addition, we compare nonresponse bias of the web survey with nonresponse bias in a comparable CATI survey. Thus we are able to quantify survey error from nonresponse using single modes as opposed to using a combination of modes. This allows us to derive guidelines which mode yields the lowest nonresponse bias in which subpopulation and which design is most efficient.
In an experimental setting we randomly assigned respondents to either phone or web mode (n=3,482). We use a probability sample of employed and registered unemployed German residents. Because the sampled persons were selected from German administrative records, record data are available for all sample units to study the bias: we can assess the overall nonresponse bias of the estimates by comparing the administrative data based statistics using respondents of the web survey only to statistics using all sampled persons.
First, based on administrative data for respondents and nonrespondents, our paper assesses nonresponse bias in mean statistics for socio-demographic variables. Second, we compare the nonresponse bias of the web mode to the nonresponse bias of the phone mode and find differential nonresponse bias between the modes. Finally, we adjust the web mode to account for nonresponse. If reweighting is successful for the web mode, this could be a cost efficient way of sampling.
In 2011, the Italian National Institute of Statistics realized the 15th Population Census. The Census process was firstly devoted to quality data control, mainly realized by the Territorial Offices, which have the responsibility of the Census at local level. This paper illustrates the actions developed by Tuscany Office to manage the quality of process. Quality controls were performed through a careful field work with the local institutions encharged of data collection and through the elaboration of a set of paradata to monitor the whole process.
The field work mainly consisted in training and supporting the local census actors. More specifically, training was realized through 69 modules involving more than 4 thousand people. Supporting was performed by responding to requests from network (by phone and e-mail), and by carrying out inspection checks.
Furthermore, the Territorial Office for Tuscany introduced an original paradata tool (control tables) to better control the main phases of the process. In our case, paradata represent a subset of the broader class of process data, focusing on response rate, comparison between census and population registers, processing of potential new census units, etc. Control tables highlighted the most critical features of the process, the lack of timeliness and accuracy, and allowed to carry out effective counteractions.
The early identification of the criticalities allowed to improve the census quality without increasing the statistical burden and offering valid and friendly support for all the actors involved and for further development in the light of the rolling population census.