Data quality in probability-based and nonprobability online panels |
|
Session Organisers | Dr Carina Cornesse (University of Mannheim) Dr Olga Maslovskaya (University of Southampton) |
Time | Friday 2 July, 16:45 - 18:00 |
In our current digital age, numerous online panels offer researchers inexpensive, fast, and flexible data collection. However, it has frequently been questioned whether these online panels can provide high enough data quality to allow valid inferences to the general population. This may in part depend on the choice of sampling and recruitment design. Whereas most online panels rely on nonprobability sampling as well as recruitment based on volunteer self-selection on the internet, other online panels rely on traditional probability-based offline sampling and recruitment procedures. Many of the latter are recruited on the back of established interviewer-administered survey programmes or via other offline surveys that may be designed specifically for the purpose of recruiting the panel.
In this session, we will explore empirically the opportunities and challenges of existing probability-based and nonprobability online panels for answering different types of research questions. This includes research questions aimed at inferring from a survey sample to the general population, exploring associations between variables, and conducting longitudinal research. The session is designed to provide evidence on various aspects of data quality of probability-based and nonprobability online panels from different perspectives through investigation of different sources of survey error in online panels.
Keywords: online panel, data quality, probability sampling, representativeness, inference
Dr Olga Maslovskaya (University of Southampton) - Presenting Author
Mr Curtis Jessop (NatCen Social Research)
Professor Gabriele Durrant (University of Southampton)
We live in a digital age with high level of use of technologies. Surveys have also started adopting technologies for data collection. There is a move towards online data collection across the world due to falling response rates and pressure to reduce survey costs. Evidence is needed to demonstrate that the online data collection strategy will work and produce reliable data which can be confidently used for policy decisions. No research has been conducted so far to assess data quality in the UK NatCen probability-based online panel. This paper is timely and fills this gap in knowledge. This paper aims to compare data quality in NatCen probability-based online panel and non-probability panels (YouGov, Populus and Panelbase). It also compares NatCen online panel to the British Social Attitude (BSA) probability-based survey on the back of which NatCen panel was created and which collects data using face-to-face interviews.
Various absolute and relative measures of differences will be used for the analysis such as mean average difference and Duncan dissimilarity Index among others. This analysis will help us to investigate how sample quality might impact on differences in point estimates between probability and non-probability samples. Differences and similaritites in bivariate and multivariate contexts will be presented and discussed.
This paper compares data quality between “gold standard” probability-based survey which collects data using face-to-face interviewing, probability-based online panel and non-probability based online panels. Recommendations will be provided for future waves of data collection and new probability-based as well as non-probability-based online panels.
Mr Tobias Rettig (University of Mannheim) - Presenting Author
Dr Carina Cornesse (University of Mannheim)
Professor Annelies Blom (University of Mannheim)
A number of studies have shown that probability-based surveys lead to more accurate univariate estimates than nonprobability surveys. However, few studies have explored the accuracy of bivariate and multivariate estimates. Some researchers claim that while they do not produce accurate univariate estimates, nonprobability surveys are “fit for purpose” when conducting bivariate and multivariate analyses. We investigate this claim using data from a large-scale comparison study that included data collection in two academic probability-based online panels and eight commercial nonprobability online panels in Germany with identical questionnaires and field periods. For each of the online panels, we calculate bivariate associations and multivariate models and compare the results to gold-standard benchmarks, examining whether the direction and statistical significance of the coefficients accurately reflects the expected outcomes. Preliminary results on key political variables (voter turnout, voting for a major conservative or right-wing populist party) indicate that the probability-based panels tend to produce similar and generally more accurate results, while there is a lot of variability in the results of the nonprobability panels. Unlike the probability-based online panels, the nonprobability online panels produce some significant associations that are contrary to expected outcomes (e.g., that older people are significantly less likely to vote for the major conservative party). Further analyses will extend these comparisons to health-related items (subjective health, BMI) and psychological indicators (Big 5, need for cognition).
Dr Carina Cornesse (University of Mannheim) - Presenting Author
Dr Daniela Ackermann-Piek (GESIS - Leibniz Institute for the Social Sciences)
Professor Annelies Blom (University of Mannheim)
Online panels have increased in popularity in recent years, because they allow for fast and inexpensive survey data collection. However, because online panels are often only used for cross-sectional research, they might be unaccustomed to re-surveying the same people several times, which is a key prerequisite for longitudinal research. The extent to which online panels lend themselves to longitudinal research might be attributable to whether they are academic probability-based online panels or commercial nonprobability online panels. While academic probability-based online panels are usually set up in a way that specifically aims at re-surveying the same people over time, commercial nonprobability online panels typically focus on providing large respondent pools for cross-sectional surveys. This difference in scope between probability-based and nonprobability online panels has consequences for their panel maintenance. For example, many probability-based online panels strive for consistency and regularity in their data collection rhythm, incentive scheme, and questionnaire layout. In contrast, nonprobability online panels are usually more concerned with increasing the size of their participant pool than with strengthening the commitment of the individual participants. In this presentation, we provide evidence on the extent to which these differences in scope, recruitment, and maintenance between academic probability-based online panels and commercial nonprobability online panels impact their potential for conducting longitudinal research. We draw this evidence from a large-scale comparison study in which three waves of data collection were commissioned in parallel to two academic probability-based online panels and eight commercial nonprobability online panels in Germany. Our findings show that only the probability-based online panels achieve adequate retention rates across data collection waves that allow for longitudinal research. In our presentation, we discuss the impact of these findings, in particular with regard to biases in survey estimates.
Mr Curtis Jessop (NatCen Social Research) - Presenting Author
Mr Peter Cornick (NatCen Social Research)
In 2017 the Department for Transport (DfT) were considering transitioning some of their attitudinal survey research work from a face-to-face survey methodology with a fresh sample to a predominantly online fieldwork design using a sample from a probability-based panel in order to reduce costs and fieldwork times. To explore the potential impact of any change, 40 questions measuring people’s views on different transport issues were run in parallel on the 2017 wave of the British Social Attitudes survey, face-to-face survey with a fresh probability-based sample, and the Aug/Sep 2017 wave of the NatCen Panel, a web/telephone survey with a probability-based panel sample.
We find that while many of the estimates produced by the two approaches were comparable, some estimates were both statistically significantly, and substantially, different. This paper will present these findings before using background information about panel members to explore the extent to which these differences may have been driven by differences in the sample composition and reviewing how the discrepancies in the presentation of questions between modes may have affected how people answered, and which types of questions may have been more affected.