ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Item Nonresponse and Unit Nonresponse in Panel Studies

Session Organisers Dr Uta Landrock (LIfBi – Leibniz Institute for Educational Trajectories)
Dr Ariane Würbach (LIfBi – Leibniz Institute for Educational Trajectories)
Mr Michael Bergrab (LIfBi – Leibniz Institute for Educational Trajectories)
TimeWednesday 16 July, 13:30 - 15:00
Room Ruppert rood - 0.51

Panel studies face various challenges, starting with establishing a panel, ensuring panel stability, minimizing sample selectivity and, overall, achieving high data quality. All these challenges are compromised by issues of nonresponse. Unit nonresponse may lead to small sample sizes, particularly if it occurs in the initial wave. It also may lead to panel attrition: Besides active withdrawals, respondents might also drop out for administrative reasons, for example, if (recurrent) non-respondents are excluded from the sample. Item nonresponse implies reduced data quality, since it decreases the statistical power for analyses based on the variables in question when respondents with missing information are excluded from analyses. It may, in extreme cases, lead to variables needing to be excluded from analyses due to their high proportion of missing values. Both, unit nonresponse and item nonresponse may introduce biases, either by increasing sample selectivity or by affecting the distribution of certain variables.
New and alternative data sources may shed new light on the issue of nonresponse, given that it is not yet entirely clear how these developments will affect longitudinal data collection.
We invite researchers to participate in this discussion, which may – among many others – include the following topics:
- Quantifying item and unit nonresponse, including resulting selectivity,
- Measuring the development of item and unit nonresponse across panel waves,
- Implications of item and unit nonresponse on data quality,
- Strategies for reducing item and unit nonresponse, e.g. by developing new question or response formats, introducing tailored incentive schemes, or offering different modes,
- Problems related to such measures, e.g., comparability across panel waves,
- Handling item and unit nonresponse, for example, by imputing missing values or weighting,
- Using contact and paradata to avoid item and unit nonresponse by monitoring fieldwork during data collection.

Keywords: item nonresponse, unit nonresponse, panel data

Papers

Nonresponse Trends in Register-based Sample Surveys: Evidence from the Icelandic Labour Force Survey 2003-2023

Dr Hafsteinn Einarsson (University of Iceland)
Dr Arndis Vilhjalmsdottir (Statistics Iceland) - Presenting Author
Mr Olafur Mar Sigurdsson (Statistics Iceland)

Response rates in labour force surveys have declined over time in most countries, possibly affecting sample composition. Therefore, comprehensive analyses of trends in labour force survey participation are important for evaluating data quality and sample representativeness. In this study, we analyse cross-sectional response rate and sample composition trends in the Icelandic Labour Force Survey (IS-LFS), a register-based telephone sample survey of individuals conducted over a period of 21 years (2003-2023). Specifically, we decompose nonresponse into three parts: 1) those lacking a known telephone number (unreachable by the response mode), 2) noncontacts among those who do have a known telephone number, and 3) refusals, to evaluate their relative contributions to response rate and nonresponse bias trends. Results show that average yearly response rates have declined from 81.5% in 2003 to 49.7% in 2023, a decline of about 1.5 percentage points per year, and that relative nonresponse bias relating to sample composition has doubled, on average, over the same period. These trends are attributed more to an increasing percentage of sampled individuals lacking a known telephone number and increasing noncontacts than refusals, particularly since 2013. Having an immigration background is the strongest predictor of nonresponse throughout the entire period, while age and education have become increasingly predictive of nonresponse in recent years. Individuals not in the labour force are consistently less likely to participate relative to employed persons but the receipt of unemployment benefits was rarely predictive of survey participation. Based on these results, the sampling frame in the IS-LFS was revised to include “sign-of-life” indicators with the aim of excluding those unlikely to really be living in the country. Moreover, the sampling method was changed from a simple random sample to a stratified sample based on background. Results of these revisions


Unveiling Hidden Patterns: A Flexible Approach to Understanding Survey Nonresponse

Mr Felix Süttmann (German Institute for Economic Research) - Presenting Author
Professor Sabine Zinn (German Institute for Economic Research)

Recent developments in mixed survey mode designs and the growing availability of auxiliary and administrative data have opened new pathways for improving models that address unit nonresponse. While machine learning methods, like random forests, have gained attention for their strong predictive performance, they often lack interpretability, limiting their utility for researchers seeking to understand the underlying selection mechanisms in their data. On the other hand, traditional parametric Generalized Linear Models (GLMs) with main effects, commonly used for this purpose, impose rigid assumptions about variable relationships and fail to capture or explain interactions between covariates, reducing their descriptive power.

To overcome these limitations, we propose a logit model with grouped LASSO to detect interpretable interaction effects. This innovative approach accommodates larger sets of variables while ensuring that any identified interactions always include their corresponding main effects. Through a structured penalization technique, our method relaxes the restrictive functional form of traditional models, enhancing flexibility without compromising interpretability. Moreover, it leverages the LASSO’s favorable bias-variance tradeoff, offering a robust balance between predictive accuracy and explanatory clarity.

Using data from the German Socio-Economic Panel (SOEP), we construct a comprehensive dataset integrating past-wave information, regional data, and household fieldwork metadata. We then implement and benchmark our model against a random forest and traditional logit models to assess performance. Finally, we identify the most relevant interaction effects and discuss their implications for understanding nonresponse mechanisms.


Factors of attrition in online panels: Evidence from 4 waves of the Values in Crisis survey in Russia

Dr Boris Sokolov (HSE University) - Presenting Author
Dr Yuri Rykov (Neuroglee Therapeutics)
Dr Viyaleta Korsunava (HSE University)

Non-probability online surveys are increasingly used in numerous academic and practical applications, including longitudinal studies. While the strengths and weaknesses of online surveys are generally well-known, limited sample representativeness is a significant drawback. Furthermore, panel online surveys suffer from respondent attrition, which can further exacerbate bias in both univariate and multivariate estimates derived from such surveys.
We analyze factors of respondent attrition in online longitudinal surveys using Russian data from the "Values in Crisis" (VIC) project, an international investigation into the societal impact of the COVID-19 pandemic that initially involved 18 countries. Russia is the only country where four waves of data collection were conducted: June 2020 (N = 1527), April-May 2021 (N = 1169), November-December 2021 (N = 1203), and July-September 2022 (N = 1205). All data were collected through a commercial opt-in panel. Despite replenishing the sample with new participants to maintain quotas, attrition remained significant. Only 606 individuals (39.7% of the initial wave) participated in all four waves. While dropped-out respondents were replaced, they were also allowed to re-enter the study in subsequent waves.
We find that female representation in the balanced panel decreased significantly compared to the baseline sample. Furthermore, attrition rates were higher among participants with lower levels of education, those residing in rural areas, those with lower socioeconomic status, and those who were childless or unmarried. Along with it, regularized logistic regression, random forest, and XGBoost models suggest that greater age, longer interview length, and higher scores on Schwartz's Conservation and Self-Transcendence values have the highest relative importance in prediction of respondent survival.
Our findings reveal that certain demographic and attitudinal variables are non-trivially associated with respondent survival. This information can be valuable for survey practitioners in designing more effective longitudinal online surveys.


Attrition Patterns and Warning Signs in a Long-Term, High Frequency Probability Online Panel

Dr Tobias Rettig (University of Mannheim) - Presenting Author
Dr Anne Balz (University of Mannheim)

Longitudinal- and panel studies face a gradual loss of respondents, i.e., panel attrition. Over time, lower respondent numbers lead to loss of statistical power, and, if attrition is nonrandom, systematic loss of respondents and biases in the remaining sample. Using data from a long-term high-frequency panel, we investigate (1) which respondents are disproportionately lost over time (2) warning signs of attrition, and (3) which survey features are associated with higher attrition and thus present opportunities for optimization. Using data from a probability online panel of the German population, we analyze respondents’ participation in over 70 panel waves spanning 12 years.
Descriptively, we investigate how the panel composition changes over time and which respondents are disproportionately lost. We observe high attrition over the first panel waves and slower but steady loss of respondents long term. Over time, the sample tends towards being higher educated, more likely to be married, and skews slightly more male. Using a survival model, we investigate risk factors for attrition on the respondent level, in their participation patterns, and in survey features. Attrition risk is lower for younger, higher educated, and full-time employed respondents. Higher attrition risk is associated with patterns of infrequent participation (breakoffs, missing panel waves, participating late during field time, item nonresponse), but not with interrupting the survey to continue later. Higher attrition risk is also associated with longer and poorly rated survey waves.
Better understanding attrition patterns may aid panel practitioners in accurately predicting how many respondents to expect in future waves, when and how many respondents to recruit, and which groups should be specifically targeted or oversampled. We identify several groups that are at higher risk for attrition, warning signs that may be used to counteract attrition with targeted interventions, and opportunities to optimize surveys for continued participation.


Handling constraints in automated statistical data editing via full conditional distributions

Professor Christian Aßmann (Leibniz Institute for Educational Trajectories; Chair of Survey Statistics and Data Analysis, Otto-Friedrich-University Bamberg)
Dr Ariane Würbach (Leibniz Institute for Educational Trajectories) - Presenting Author
Ms Katja-Verena Bürk (German Federal Statistical Office)
Mr Florian Dumpert (German Federal Statistical Office)

Reported survey data are prone to inaccuracies due to respondent error as reported values may be missing or implausible, i.e., they do not satisfy logical constraints. When such logical constraints are due to the interaction of multiple variables, it is also unclear which variable or variables are actually erroneous. A standard method used by Statistical Offices to correct data and ensure data consistency are edit-imputation routines following the Felligi-Holt paradigm. Using such an easily computable heuristic does not necessarily exploit all the information available in the observed data. Another way that incorporates all available information is to apply Bayesian methods in the form of full conditional distributions of missing values to properly account for the uncertainty that arises in the process of replacing erroneous values. While Bayesian approaches based on parametric models are available in the literature for categorical and continuous data, this paper presents a method for specifying full conditional distributions using classification and regression trees instead, while taking into account nested balance constraints, i.e., linked constraints involving multiple variables. The CART algorithm was chosen, because it provides flexible univariate approximations to the full conditional distributions of the variables while reducing the computational intensity of the overall Bayesian approach. Results from simulation suggest that, compared to complete case analysis, the average root mean squared error of moment estimates can typically be reduced by 20 to 30 percent when using the nonparametric Bayesian approach and the corresponding specification of full conditional distributions using the CART algorithm.