All time references are in CEST
Opportunities and Challenges in Dealing with Selection Bias in Cross-sectional and Longitudinal Surveys 2 |
|
Session Organisers | Professor Sabine Zinn (Socio-Economic Panel at DIW ) Dr Jason M. Fields (U.S. Census Bureau) Dr Hans Walter Steinhauer (Socio-Economic Panel at DIW) |
Time | Thursday 20 July, 14:00 - 15:30 |
Room | U6-07 |
Analysing survey data usually also means coping with selection bias. There are proven and well-established strategies for doing so, such as survey weighting or selection modelling. However, still many data users struggle in understanding how to apply these strategies, especially when confronted with the diversity of the information given by the survey providers. Beyond that, increasingly researchers use machine learning and Bayesian statistics in survey data analysis. This is also true for conducting and controlling surveys. Specifically, adaptive contact or motivational strategies are designed for upcoming survey studies or waves based on response processes observed in previous surveys or survey waves. The estimation of population statistics is improved by including information about the entire selection process in the statistical model, both developing these methods and communicating their use are critical.
In this session, we welcome research on novel approaches and strategies to ease data users understanding of how to handle selection bias in their statistical analysis. This research might cover:
-Methods for easing, and communicating, the appropriate use of weights or other methods for addressing selection biases in published microdata files. These may include, but are not limited to, longitudinal weights, calendar year weights, replicate weights, multiple implicates, and other tools to improve the population representativeness and communication of uncertainty in public data products.
-Novel methods to assess and adjust for sources of bias in cross-sectional and longitudinal surveys, including, but not limited to, machine learning interventions, adaptive design, post-hoc weighting calibrations, informed sampling, etc. How are these communicated to data users? How are they adapted as response and biases change?
-Papers are encouraged that investigate the selection processes, papers that leverage novel modelling strategies for coping with selection bias in statistical analysis, and papers that include examples of modelling non-ignorable selection bias in substantive analysis.
Keywords: Selection bias, weighting, adaptive designs, non-ignorable selection, weighting
Dr Brady West (University of Michigan-Ann Arbor) - Presenting Author
Among the numerous explanations that have been offered for recent errors in pre-election polls, selection bias due to non-ignorable partisan nonresponse bias, where the probability of responding to a poll is a function of the candidate preference that a poll is attempting to measure (even after conditioning on other relevant covariates used for weighting adjustments), has received relatively less focus in the academic literature. Under this type of selection mechanism, estimates of candidate preferences based on individual or aggregated polls may be subject to significant bias, even after standard weighting adjustments. Until recently, methods for measuring and adjusting for this type of non-ignorable selection bias have been unavailable. Fortunately, recent developments in the methodological literature have provided political researchers with easy-to-use measures of non-ignorable selection bias. In this study, we apply a new measure that has been developed specifically for estimated proportions to this challenging problem. We analyze data from 18 different pre-election polls: nine different telephone polls conducted in eight different states prior to the U.S. Presidential election in 2020, and nine different pre-election polls conducted either online or via telephone in Great Britain prior to the 2015 General Election. We rigorously evaluate the ability of this new measure to detect and adjust for selection bias in estimates of the proportion of likely voters that will vote for a specific candidate, using official outcomes from each election as benchmarks and alternative data sources for estimating key characteristics of the likely voter populations in each context.
Mrs Angelina Hammon (SOEP, DIW Berlin) - Presenting Author
The increasing use of alternative non-probabilistic data collection strategies in survey research demands methods for assessing the sensitivity of respective population estimates. For this purpose, Andridge et al (2019) propose an index to quantify potential (non-ignorable) selection bias in proportions. We validate this index with an artificial non-probability sample generated from a large empirical data set and additionally applied it to proportions estimated from data on current political attitudes arising from a real non-probability sample selected via River sampling. When the requirements of the index are fulfilled, it shows an overall good performance in detecting and correcting present selection bias in estimated proportions, and thus provides a powerful measure for evaluating the robustness of results obtained from non-probability samples.
Dr Martina Narayanan (Centre for Longitudinal Studies, UCL Social Research Institute, University College London)
Mr Brian Dodgeon (Centre for Longitudinal Studies, UCL Social Research Institute, University College London)
Dr Michail Katsoulis (MRC Unit for Lifelong Health & Ageing, University College London)
Dr Richard Silverwood (Centre for Longitudinal Studies, UCL Social Research Institute, University College London)
Professor George Ploubidis (Centre for Longitudinal Studies, UCL Social Research Institute, University College London) - Presenting Author
Many COVID-19-specific surveys have been initiated in order to explore different aspects and implications of the pandemic. As with all surveys, these are subject (to a greater or lesser extent) to unit non-response. Since respondents are often systematically different to non-respondents, this can introduce bias to analyses using the observed sample. Bias due to selection into response is difficult to address when there is no information on the individuals who do not respond, as will often be the case in such surveys. In contrast, COVID-19 surveys conducted within existing population-based surveys allow us to capitalise on the rich data available in earlier waves to address bias due to selective response to the COVID-19 surveys. We describe and contrast alternative approaches to handling selective response in this setting, including inverse probability weighting (IPW) using either generic (i.e. non-analysis-specific) or analysis-specific weights, and multiple imputation. Analyses were conducted to investigate the extent to which these approaches can restore sample representativeness in terms of early life variables which were essentially fully observed and relative to external population-representative data. We used data from three waves of COVID-19 surveys conducted within five UK longitudinal population-based surveys run by the University College London Centre for Longitudinal Studies and the UK Medical Research Council Unit for Lifelong Health and Ageing – National Survey of Health and Development, 1958 National Child Development Study, 1970 British Cohort Study, Next Steps, and Millennium Cohort Study – with participants aged between 19 and 74 at the start of the COVID-19 pandemic.
Dr Santiago Gómez (Vrije Universiteit Amsterdam) - Presenting Author
Professor Dimitris Pavlopoulos (Vrije Universiteit Amsterdam)
Dr Reinoud Stoel (Statistics Netherlands)
Dr Ton de Waal (Statistics Netherlands)
Dr Arnout van Delden (Statistics Netherlands)
Selection bias is one of the most prevalent concerns when dealing with sampling and the principal
motivation behind the different random sampling methods implementation. However, currently, non-
probabilistic data is ever more available given the advent of Big Data, and thus selection bias needs
to be revisited. More specifically, there is a want for estimates that capture the degree of systematic
error due to selection. Several authors have proposed sensible approaches to this problem, which have
been implemented already to analyze issues such as the bias in voting polls in the United States 2016
election. However, several proposed estimators depend on unobserved parameters that are determined
arbitrarily. Given this, in this study, we detail the approaches from Meng(2018) and Little et al.(2020)
and estimate their sensitivity to data variations that are frequent in practice, like skewed distributions
and high selectivity. To do so, we conduct a series of simulations and employ a case study to evaluate the
performance of these novel selection bias estimators and some alternatives. Our analyses indicate that
a high correlation between the selection variable and the target variable implies less precise estimates
for most of the estimators though this increased variance could be reduced considerably when employing
highly informative auxiliary variables. Besides, the simulation results indicate that the leading predictor
of the bias of the estimates is the skewness of the target distribution. We conclude by making some
remarks on the current state of the literature on selection bias estimators and future research paths.