ESRA 2023 Program

All time references are in CEST

Assessing the Quality of Survey Data 4
Session Organiser	Professor Jörg Blasius (University of Bonn)
Time	Thursday 20 July, 14:00 - 15:30
Room	U6-06

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the “substantive” solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings.

Keywords: Quality of data, task simplification, response styles, satisficing

Papers

Insufficient Effort Responding with Adolescent Respondents: Measurement, Extent, and Prediction

Dr Thomas Krause (University of Stuttgart)
Professor Susanne Vogl (Univeristy of Stuttgart) - Presenting Author
Professor Christine Sälzer (University of Stuttgart)

Self-reports are an indispensable tool for empirical social research. However, the prerequisites for this form of data collection are willingness and ability of the respondents to cooperate. Therefore, data quality threats arise in absence of these prerequisites. Often, completing the interview as fast as possible is a likely goal of respondents, so little (cognitive) effort is made in the question-answer process. "Insufficient Effort Responding" (IER) refers to arbitrary, inattentive, or inconsistent response behavior. In this form of response bias, respondents are unwilling/unable to follow question prompts or to provide adequate responses to survey questions.
In this respect, surveys in school settings are particularly challenging: On the one hand, adolescents are a special target group, on the other hand, the institutional setting is a very specific interaction situation. Thus, we assess the extent of IER among adolescents in a general classroom survey and test how IER can be measured efficiently. Our results are based on the Youth Study 2022 Baden-Württemberg, in which we surveyed 9th grade pupils online. We quantify the extent of IER, identify structural determinants, and attempt to approximate reactive measures using non-reactive metrics. The reactive measures include so-called Instructed Response Items (IRI), Infrequent Items, and Bogus Items. The non-reactive measures consist of established metrics such as response time, long-string analysis, Mahalanobis Distance, and intra-individual response variability. The goal is not only to perform a comparative analysis of the IER measures, but also to attempt to predict reactive measures using non-reactive measures based on machine learning techniques. Furthermore, using different scenarios, we analyze the loss of precision for predicting explicitly measured (reactive) IER using non-reactive metrics.

Memory Effects in Online Panel Surveys: Investigating Respondents’ Ability to Recall Responses from a Previous Panel Wave

Dr Tobias Rettig (University of Mannheim) - Presenting Author
Dr Bella Struminskaya (Utrecht University)

If respondents recognize repeated survey questions and remember their previous responses, they may use this information in processing the repeated question instead of undergoing the response process independently. Respondents may for example edit their later response for consistency or use their previous response as an anchor to adjust from. This may be undesirable for certain questions where researchers are interested in a current unbiased judgement. In contrast to dependent interviewing, where researchers choose to present respondents with their previous responses to certain questions, respondents may remember their responses to any question accurately, inaccurately, or not at all, leading to different levels of accuracy in their later response. Most studies to date have investigated memory effects in the context of repeated measurements within cross-sectional surveys. We extend this research to a longitudinal context by investigating whether respondents remember their responses to different types of questions (beliefs, attitudes, and behaviors) from a previous wave in a probability-based online panel in Germany. We find evidence that some respondents remember their responses even after four months, but at a considerably lower rate than within cross-sectional surveys. Respondents who could not remember their response were most commonly off by only a single scale point. Respondents remembered their responses to different types of questions at different rates and were more likely to remember an extreme response. Female respondents were more likely to remember their responses, but we find no link to age, education, perceived response burden, survey enjoyment or online panel experience. As respondents could not remember their previous responses in most cases and we find little evidence for a systematic variation of memory effects across groups of respondents, we conclude that the potential for measurement error due to memory effects across panel waves is low after four months or longer.

The Effect of Disclosing the Survey Topic on Nonresponse Bias

Ms Julia C. Post (University of Potsdam) - Presenting Author

Nonresponse bias occurs if the variables of interest are correlated with the response probability. Since nonresponse is rather the rule than the exception, researchers are concerned with the question which conditions could cause such a correlation. One possible factor is the survey topic (Groves/Singer/Corning 2000; Groves/Presser/Dipko 2004). If the topic has an effect on the response probability this could lead to nonresponse bias especially in statistics connected to the survey topic. As it stands, our knowledge on whether and how the survey topic affects nonresponse primarily stems from experiments on special populations which raises the question of external validity.

To put the focus on real world consequences, I propose a research design that uses existing large population surveys. The analysis involves around 30 studies conducted from 2010 – March 2020 in Germany which use a probability sample from the German residential population and provide sufficient methodological documentation. The bias is operationalized by the deviation of the mean of variables such as the percentage of people having diabetes or membership in political parties, from external benchmarks (e.g. from official statistics). It is then studied whether or not the disclosure of the survey topic affects the deviation of the survey from the benchmark. A second approach is to study whether or not concealing the survey topic increases item nonresponse on variables connected to that topic. The presentation focuses on the design of the study and the data selection and gives insights into initial findings.

Are Vaccination Campaigns Misinformed? Experimental Evidence from COVID-19 in Low- and Middle-income Countries

Mr Yannick Markhof (United Nations University - MERIT and Development Data Group, World Bank) - Presenting Author
Mr Philip Wollburg (Development Data Group, World Bank)
Dr Alberto Zezza (Development Data Group, World Bank)

Have COVID-19 vaccination campaigns been misinformed by inaccurate survey data? This study investigates the alignment of administrative vaccination data with survey data from national high-frequency phone surveys and face-to-face data collection. In the context of COVID-19, administrative statistics are the primary resource informing the progress of vaccination campaigns, but survey data is being used for information on vaccine hesitancy, barriers of access, and other ways to expedite vaccination efforts. Past research from before the pandemic and anecdotal evidence from COVID-19 have indicated that both data sources are subject to a number of potential sources of measurement error that threaten their ability to provide accurate insights to vaccination campaigns. We study the extent of this issue in the context of Sub-Saharan Africa, a region that is trailing the rest of the world in reported vaccination rates. We find that vaccination rates estimated from survey data consistently exceed administrative figures across our study countries. Based on this, we set out to investigate sampling and non-sampling related sources of this misalignment. Using a series of survey experiments, we explore five potential sources of measurement error in the survey data: (i) sampling and coverage biases, (ii) proxy reporting, (iii) survey mode, (iv) panel conditioning, and (v) experimenter demand effects. Based on our findings, we develop recommendations for survey design. As such, our contribution is relevant beyond the context of COVID-19 and matters for a large body of methodological research on survey data quality as well as applied research on vaccine uptake and vaccination campaigns.

Assessing Data Quality in the Age of Digital Social Research: A Systematic Review

Dr Jessica Daikeler (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany ) - Presenting Author
Miss Indira Sen (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )
Mr Lukas Birkenmaier (GESIS Leibniz Institute for the Social Sciences, Cologne, Germany )
Mr Leon Fröhling (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )
Dr Tobias Gummer (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )
Dr Clemens Lechner (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )
Dr Henning Silber (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )
Dr Bernd Weiss (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )
Dr Katrin Weller (GESIS Leibniz Institute for the Social Sciences, Mannheim, Germany )

Relevance & Research Question: Only 30 years ago, a few could anticipate the possibilities in data collection offered by devices such as computers and smartphones. Today, new technologies allow social scientists to track “ordinary behavior” by clustering activities and opinions on online platforms (e.g., social media), and have opened new avenues for analyzing, understanding, and addressing social science research questions. To target social science data quality within this new era of computational social science it is essential to link quality concepts of the information and computer sciences with those in the social sciences. Consequently, the present study aims to systematize social science data quality concepts in the light of old and new social science research data.
To guide researchers in questions on data quality, our study aims to facilitate interdisciplinary exchange by providing a comprehensive and systematic review of existing frameworks on data quality. By investigating our research question, we will provide answers for practical questions such as: Is the association between data quality concepts of the information and computer sciences and the social sciences already mapped out in the existing data quality concepts? Which quality dimensions, design decisions, and quality indicators are currently represented in existing quality concepts, where are conceptual gaps, and which quality concept is most appropriate given the researcher's data and research questions?

Methods: We develop and present our results with the help of a systematic review and use for the systematic literature search and coding.We relied on text mining methods to conduct this systematic literature search and coding approach.

Added Value: Results from our study will contribute to the identification of relevant data quality frameworks for social scientists with both traditional and new data types. Additionally, our study will facilitate interdisciplinary exchange between the computer and social sciences.