ESRA logo

ESRA 2023 Glance Program


All time references are in CEST

Item Nonresponse and Unit Nonresponse in Panel Studies

Session Organisers Dr Uta Landrock (LIfBi – Leibniz Institute for Educational Trajectories)
Dr Ariane Würbach (LIfBi – Leibniz Institute for Educational Trajectories)
Mr Michael Bergrab (LIfBi – Leibniz Institute for Educational Trajectories)
TimeTuesday 18 July, 09:00 - 10:30
Room

Panel studies face various challenges, starting with establishing a panel, ensuring panel stability, minimizing sample selectivity and, overall, achieving high data quality. All these challenges are compromised by issues of nonresponse. Unit nonresponse may lead to small sample sizes, particularly if it occurs in the initial wave. It also may lead to panel attrition: Besides active withdrawals, respondents might also drop out for administrative reasons, for example, if (recurrent) non-respondents are excluded from the sample. Item nonresponse implies reduced data quality, since it decreases the statistical power for analyses based on the variables in question when respondents with missing information are excluded from analyses. It may, in extreme cases, lead to variables needing to be excluded from analyses due to their high proportion of missing values. Both, unit nonresponse and item nonresponse may introduce biases, either by increasing sample selectivity or by affecting the distribution of certain variables.
New and alternative data sources may shed new light on the issue of nonresponse, given that it is not yet entirely clear how these developments will affect longitudinal data collection.
We invite researchers to participate in this discussion, which may – among many others – include the following topics:
- Quantifying item and unit nonresponse, including resulting selectivity,
- Measuring the development of item and unit nonresponse across panel waves,
- Implications of item and unit nonresponse on data quality,
- Strategies for reducing item and unit nonresponse, e.g. by developing new question or response formats, introducing tailored incentive schemes, or offering different modes,
- Problems related to such measures, e.g., comparability across panel waves,
- Handling item and unit nonresponse, for example, by imputing missing values or weighting,
- Using contact and paradata to avoid item and unit nonresponse by monitoring fieldwork during data collection.

Keywords: item nonresponse, unit nonresponse, panel data

Papers

Do talk money – Reducing income nonresponse in surveys

Ms Melanie Koch (Oesterreichische Nationalbank) - Presenting Author
Ms Katharina Allinger (Oesterreichische Nationalbank)

Item nonresponse can be a pervasive issue in surveys. Questions that are especially prone to nonresponse are questions about monetary values, like income. We implement an experiment to reduce nonresponse to income questions in an international household survey, looking at four different countries, where income nonresponse is very common.

In the experiment, survey respondents are always asked to report their exact household income first. Then, we randomize those who refuse to answer into two groups. In a follow-up question, the control group is asked to choose their income from a very granular list of at least 20 brackets. The treatment group is simply asked if their income falls into the first, second or third pre-defined income tercile. With the treatment, we want to test to what extent nonresponse to income questions can be reduced by lowering the number of brackets.

We expect nonresponse on exact income amounts to be caused by two main reasons: either the person does not know the exact amount or is not willing to share the amount because of privacy concerns. For both reasons, fewer brackets should be a remedy.

Indeed, in all four countries, the treatment leads to a significant decrease in nonresponse of between 11 to almost 28 percentage points. Moreover, the treatment seems to be especially effective for those people who were not willing to report the exact amount in contrast to those who were not able to answer it. We do not find large heterogeneous effect across different population subgroups, meaning nonresponse is reduced in all gender, age and educational groups. There is no positive spillover effect on the willingness to answer subsequent questions on exact amounts for personal income.

Thus, when condensed income data are sufficient, fewer answer options are a cost-effective way to reduce nonresponse.


Unveiling Hidden Patterns: A Flexible Approach to Understanding Survey Nonresponse

Mr Felix Süttmann (German Institute for Economic Research) - Presenting Author
Professor Sabine Zinn (German Institute for Economic Research)

Recent developments in mixed survey mode designs and the growing availability of auxiliary and administrative data have opened new pathways for improving models that address unit nonresponse. While machine learning methods, like random forests, have gained attention for their strong predictive performance, they often lack interpretability, limiting their utility for researchers seeking to understand the underlying selection mechanisms in their data. On the other hand, traditional parametric Generalized Linear Models (GLMs) with main effects, commonly used for this purpose, impose rigid assumptions about variable relationships and fail to capture or explain interactions between covariates, reducing their descriptive power.

To overcome these limitations, we propose a logit model with grouped LASSO to detect interpretable interaction effects. This innovative approach accommodates larger sets of variables while ensuring that any identified interactions always include their corresponding main effects. Through a structured penalization technique, our method relaxes the restrictive functional form of traditional models, enhancing flexibility without compromising interpretability. Moreover, it leverages the LASSO’s favorable bias-variance tradeoff, offering a robust balance between predictive accuracy and explanatory clarity.

Using data from the German Socio-Economic Panel (SOEP), we construct a comprehensive dataset integrating past-wave information, regional data, and household fieldwork metadata. We then implement and benchmark our model against a random forest and traditional logit models to assess performance. Finally, we identify the most relevant interaction effects and discuss their implications for understanding nonresponse mechanisms.


Attrition Patterns and Warning Signs in a Long-Term, High Frequency Probability Online Panel

Dr Tobias Rettig (University of Mannheim) - Presenting Author
Dr Anne Balz (University of Mannheim)

Longitudinal- and panel studies face a gradual loss of respondents, i.e., panel attrition. Over time, lower respondent numbers lead to loss of statistical power, and, if attrition is nonrandom, systematic loss of respondents and biases in the remaining sample. Using data from a long-term high-frequency panel, we investigate (1) which respondents are disproportionately lost over time (2) warning signs of attrition, and (3) which survey features are associated with higher attrition and thus present opportunities for optimization. Using data from a probability online panel of the German population, we analyze respondents’ participation in over 70 panel waves spanning 12 years.
Descriptively, we investigate how the panel composition changes over time and which respondents are disproportionately lost. We observe high attrition over the first panel waves and slower but steady loss of respondents long term. Over time, the sample tends towards being higher educated, more likely to be married, and skews slightly more male. Using a survival model, we investigate risk factors for attrition on the respondent level, in their participation patterns, and in survey features. Attrition risk is lower for younger, higher educated, and full-time employed respondents. Higher attrition risk is associated with patterns of infrequent participation (breakoffs, missing panel waves, participating late during field time, item nonresponse), but not with interrupting the survey to continue later. Higher attrition risk is also associated with longer and poorly rated survey waves.
Better understanding attrition patterns may aid panel practitioners in accurately predicting how many respondents to expect in future waves, when and how many respondents to recruit, and which groups should be specifically targeted or oversampled. We identify several groups that are at higher risk for attrition, warning signs that may be used to counteract attrition with targeted interventions, and opportunities to optimize surveys for continued participation.


Backing up a Panel with Piggybacking – Does Recruitment with Piggybacking lead to more nonresponse bias?

Mr Bjoern Rohr (GESIS Leibnitz Institute for the social Sciences) - Presenting Author

Sampling and recruiting respondents for (online) probability-based panels can be very expensive. One cost-intensive aspect of the process is drawing a separate sample and recruiting the respondents offline. To reduce this cost, some mixed-mode or online panels (e.g., the GESIS Panel, the German Internet Panel, and the NatCen Panel) relied on piggybacking in some recruitments or refreshments. Piggybacking means that participants for the panel are recruited at the end of another probability survey so that no additional sample has to be drawn. Though this might reduce the cost, it might also introduce additional nonresponse. Whether or not the higher amount of nonresponse also translates to higher amounts of bias in practical applications of a piggybacking survey will be analyzed in my research. To answer the research question, we use the GESIS Panel, a panel survey that was initially recruited in 2013 (n = 4961) from a separate sample but later refreshed three times with the help of piggybacking (n = 1710, 1607, 764). This setting allows us to compare the bias of both survey types against each other and, with the help of German Microcensus benchmarks, disentangle the nonresponse bias introduced by piggybacking. The bias will be measured as a relative bias for demographic and job-related variables, as well as the difference in Pearson’s r between benchmark and survey. Initial results indicate that univariate survey estimates from piggybacking surveys are more often biased directly after recruitment from the parent survey, compared to estimates from a separate recruitment. However, those differences reduce after separate waves in a panel survey, indicating that piggybacking surveys are less affected by panel attrition. Regarding Pearson’s r estimates, our analyses show mixed results.


Handling constraints in automated statistical data editing via full conditional distributions

Professor Christian Aßmann (Leibniz Institute for Educational Trajectories; Chair of Survey Statistics and Data Analysis, Otto-Friedrich-University Bamberg)
Dr Ariane Würbach (Leibniz Institute for Educational Trajectories) - Presenting Author
Ms Katja-Verena Bürk (German Federal Statistical Office)
Mr Florian Dumpert (German Federal Statistical Office)

Reported survey data are prone to inaccuracies due to respondent error as reported values may be missing or implausible, i.e., they do not satisfy logical constraints. When such logical constraints are due to the interaction of multiple variables, it is also unclear which variable or variables are actually erroneous. A standard method used by Statistical Offices to correct data and ensure data consistency are edit-imputation routines following the Felligi-Holt paradigm. Using such an easily computable heuristic does not necessarily exploit all the information available in the observed data. Another way that incorporates all available information is to apply Bayesian methods in the form of full conditional distributions of missing values to properly account for the uncertainty that arises in the process of replacing erroneous values. While Bayesian approaches based on parametric models are available in the literature for categorical and continuous data, this paper presents a method for specifying full conditional distributions using classification and regression trees instead, while taking into account nested balance constraints, i.e., linked constraints involving multiple variables. The CART algorithm was chosen, because it provides flexible univariate approximations to the full conditional distributions of the variables while reducing the computational intensity of the overall Bayesian approach. Results from simulation suggest that, compared to complete case analysis, the average root mean squared error of moment estimates can typically be reduced by 20 to 30 percent when using the nonparametric Bayesian approach and the corresponding specification of full conditional distributions using the CART algorithm.


Impact of Increased Survey Frequency on the Participation of Older Respondents in Longitudinal Surveys

Dr Michael Bergmann (SHARE Berlin Institute (SBI)) - Presenting Author
Mrs Magdalena Quezada (SHARE Berlin Institute (SBI))

To increase flexibility and to be able to respond quickly to new developments, the Survey of Health, Ageing and Retirement in Europe (SHARE) envisages supplementary surveys with additional thematic modules in the year between the biennial core survey waves. This approach will accommodate new contributions to questionnaire content, ensuring that SHARE remains interesting and relevant to researchers across Europe and beyond. At the same time, there is a concern that these additional in-between surveys will increase respondent burden and thus reduce respondents' willingness to participate in future panel waves. Previous studies are inconclusive in this regard, particularly in relation to older people, who are more likely to have health problems and may therefore be more sensitive to the increased burden of more frequent survey invitations.
To address the question of whether increased survey frequency has a negative impact on the future participation of older respondents in longitudinal surveys, we analyze SHARE data from Waves 8 and 9. Between these face-to-face waves, SHARE conducted two SHARE Corona Surveys by telephone, which could not be fully implemented in all countries. As in these countries a regionally stratified random sub-sample was drawn to select participants for the SHARE Corona Surveys, this provides a quasi-experimental condition to properly compare the participation rates in Wave 9 of randomly selected respondents who participated in the in-between surveys with those who did not.
Preliminary results show that more surveys per se do not lead to higher attrition rates in an ongoing panel study. Rather, it appears that topics of interest to respondents are not perceived as overly burdensome. The results of our study go beyond the applicability to SHARE and can provide valuable information for the design of panel surveys (of older people) in general.