ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Assessing the Quality of Survey Data

Session Organiser Professor Jörg Blasius (University of Bonn)
TimeTuesday 15 July, 09:00 - 10:30
Room Ruppert paars - 0.44

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the “substantive” solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings.

Keywords: Quality of data, task simplification, response styles, satisficing

Papers

Agree or Disagree? The Effect of Different Scaling Options on Life Satisfaction

Dr Martina Kroher (Leibniz University Hannover) - Presenting Author
Dr Sebastian Lang (Leibniz Institute for Educational Trajectories (LIfBi))

In the social sciences, scales are a common tool for collecting data on attitudes, beliefs, agreement and more. On the one hand, there are different scales to use, and on the other hand, the same scale can be used in different ways, e.g. by changing the endpoints: For example, from very satisfied to very dissatisfied or from very dissatisfied to very satisfied. It can be assumed that there is an individual effect on the respondent’s answers depending on which pole is offered first.
In addition, numbers are sometimes placed next to the scale points to identify the different responses on the scale. These numbers can also influence the answers given. Some respondents will not notice these small numbers, but others will, and this may lead to altered responses.
Overall, we are interested in whether there are any consequences when scales are implemented differently. Are they still measuring the same construct in the same way?
We analyze data from self-administered paper-and-pencil questionnaires randomly assigned to 6,000 households in Hanover, Germany, using an improved form of random route design. We test different scales by randomly varying several of these scaling options: (1) positive to negative response options with numbers, e.g. 1 to 11, (2) positive to negative response options without numbers, (3a) positive to negative response options with numbers, e.g. -5 to 5, and (3b) negative to positive response options with numbers, e.g. -5 to 5.
In our contribution, we will show whether there are effects on respondent behavior due to different scale designs in terms of labeling and direction. Initial (preliminary) results suggest that there is little effect of scale design on respondent behavior with respect to questions on life satisfaction.


Item reversal in practice: how reversing strategies and respondent characteristics impact item scores

Dr Fernanda Alvarado-Leiton (University of Costa Rica) - Presenting Author

The use of oppositely worded items in measurement scales is ubiquitous in survey research. Although controversial, mixing the direction of item wording to create balanced scales is still advised to address measurement errors such as response styles and straight-lining or to achieve scale validity.

Best practices for reversing items to create balanced scales remain up for debate, however, there is consensus that reversed items in balanced scales should be semantically equivalent to the un-reversed items.

Extant literature suggests that achieving semantic equivalence is a non-trivial task and could be dependent on multiple factors. One of these factors is the use of negations (e.g., satisfied/not satisfied) or polar opposite concepts (e.g., satisfied/unsatisfied) to reverse items. Empirical evidence suggests that although similar, negations and polar opposite wordings do not convey similar meanings and are not exactly opposite in meaning to the unreversed item.

However, today, most of the available evidence on these differences comes from small experiments that do not represent real survey scenarios. In this paper we explore the differences in meaning of negated, polar opposite items and unreversed items using data from a web survey about subjective well-being with n=3 600 participants. In addition, we investigate respondent demographics and item characteristics as possible sources of semantic differences, both missing in previous literature.

Data were collected through opt-in online panels in the United States and quotas by gender, education, race/ethnicity and age were in place to secure representation of different demographic groups. Participants were randomly assigned to negated, polar opposite or unreversed wording for five measurement scales using an Agree-Disagree rating scale. Data are analyzed through multi-level modelling to account for both respondent and item variables.


Disparities in PHQ-9 Item Sensitivity: A Potential Explanation for the Two-Factor Structure

Ms Kristín Hulda Kristófersdóttir (University of Iceland) - Presenting Author
Dr Vaka Vésteinsdóttir (University of Iceland)
Dr Hafrún Kristjánsdóttir (Reykjavík University)
Dr Þorlákur Karlsson (brandr)
Professor Fanney Þórsdóttir (University of Iceland)

The Patient Health Questionnaire-9 (PHQ-9) is one of the most widely used tools for screening and assessing depression. However, previous research has yielded inconsistent results regarding its factor structure, with most studies suggesting either a one- or two-factor model. One possible explanation for the emergence of a two-factor structure is that certain items may be more sensitive than others and, therefore, more likely to lead to socially desirable responding (SDR). This study explores this possibility by assessing the sensitivity of the PHQ-9 items. A total of 273 participants completed 36 paired comparisons of the PHQ-9 items, indicating which symptoms they would find more uncomfortable to disclose. Additionally, absolute judgments were collected, where participants rated each item as either uncomfortable or not uncomfortable to disclose. Data were analyzed using a model for pair comparisons rooted in Thurstone's law of comparative judgment to estimate the relative sensitivity of each item and whether they were more or less likely to be judged as (not) uncomfortable to disclose. Kendall's coefficients of consistence and agreement were calculated to evaluate the internal consistency of participants' responses and the level of agreement between them. Results showed that cognitive/affective symptoms, such as feelings of worthlessness and depressed mood, were perceived as more sensitive than somatic symptoms like fatigue and sleep disturbances. Notably, the sensitivity estimates obtained in this study align closely with prior factor analytic findings that have supported a two-factor model distinguishing cognitive/affective and somatic symptoms. These findings suggest that SDR may contribute to the underreporting of certain depression symptoms, particularly cognitive/affective ones, potentially accounting for the inconsistent factor structures observed in previous research. Consequently, both researchers and clinicians should consider the impact of SDR when interpreting PHQ-9 scores to ensure more accurate assessments.


Nonresponse Trends in Surveys with Highly Qualified: Evidence from Germany

Dr Thorsten Euler (German Centre for Higher Education Research and Science Studies) - Presenting Author
Mrs Ulrike Schwabe (German Centre for Higher Education Research and Science Studies)

Respondents’ willingness to answer surveys strongly determines data quality. However, research has shown that survey nonresponse is a reason for concern in household surveys in many countries (de Leeuw & de Heer, 2022; de Leeuw et al., 2018; Luiten et al., 2020). Low response rates are perceived by the public as indicators of inferior meaningfulness. For researchers, they increase recruitment costs and efforts to achieve targeted samples sizes.
While household surveys cover the whole population, we focus on surveys with highly qualified defined as individuals holding an entrance qualification to higher education as a minimum. Highly qualified individuals are a special group as they are in particular often invited to participate in surveys within their respective educational institutions. As a result, they are experiencing higher survey burden and are more likely to be the source for scientific research findings. Covering a time span from the late 1980s to 2022, we map nonresponse trends for students, graduates, PhD candidates and holders and professors in selected voluntary surveys in Germany. Further, we analyze how modes of administration, modes of contact, and incentivization influence nonresponse rates in one-off and panel surveys.
Generally, nonresponse rates in surveys with highly qualified decline over time as our target group suffers from response burden as the overall number of surveys being conducted has increased. Our investigation shows that paper and pencil surveys have higher response rates, while online only leads to lowest commitment. Monetary incentives are more attractive for students. Professors’ willingness to participate, however, seems to be boosted by being informed on the results.


Socially Desirable Responding in Panel Studies - How Does Repeated Interviewing Affect Responses to Sensitive Questions?

Mrs Fabienne Kraemer (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author

Social desirability (SD-) bias (the tendency to report socially desirable opinions and behaviors instead of true ones) is a widely known threat to the validity of self-reports. Previous studies investigating socially desirable responding (SDR) in a longitudinal context provide mixed evidence on whether SD-bias increases or decreases with repeated interviewing and how these changes affect response quality in subsequent waves. However, most studies were non-experimental and only suggestive of the mechanisms of change in SD-bias over time. This study investigates SDR in panel studies using a longitudinal survey experiment comprising six waves. The experiment manipulates the frequency of answering identical sensitive questions (target questions) and assigned respondents to one of three groups: The first group received the target questions in each wave, the second group received the questions in the last three waves, and the control group received the target questions only in the last wave. The experiment was conducted within a German non-probability (n = 1,946) and a probability-based panel (n = 4,660). The analysis focuses on between- and within-group comparisons to investigate changes in answer refusal and responses to different sensitive measures. To further examine the underlying mechanisms of change, I conduct moderator and mediator analyses on the effects of respondents’ privacy perceptions and trust towards the survey (sponsor). First results show a decrease of answer refusal and SDR with repeated interviewing for most of the analyzed sensitive measures. However, these decreases were non-significant for both between-group comparisons and comparisons over time. Altogether, this study provides experimental evidence on the impact of repeated interviewing on changes in SD-bias and contributes to a deeper understanding of the underlying mechanisms by examining topic-specific vs. general survey experience and incorporating measures on privacy perceptions and trust towards the survey (sponsor).


Assessing survey quality with replication surveys: theory and case study

Dr Blanka Szeitl (HUN-REN Center for Social Sciences, University of Szeged) - Presenting Author
Dr Tamás Rudas (University of Eötvös Loránd)

In traditional theories, the accuracy of estimates is assessed relative to the true theoretical value that characterizes the population. However, in survey practice, the true population value is rarely known, and thus the assessment may become illusory. In this study, a new aspect of describing the precision of values found in surveys is discussed: how much a value observed in a survey differs from the theoretical value that could be obtained in a replication of the survey. The first finding is theoretical: the difference between two replications of a survey can be decomposed into nonresponse uncertainty (NU) and measurement uncertainty (MU). NU is the sample component; it depends on who chooses to respond / refuses to respond to a survey. MU is the measurement component, which depends on how respondents answer the questions. The second and third findings are empirical, which are based on a case study of the European Social Survey (ESS): in contrast to the general importance attributed to non-response problems, the magnitude of NU is not relevant. Both NU and MU affect multivariate analyses, which is in line with previous findings in measurement theories in survey research.