Assessing the Quality of Survey Data 6 |
|
Session Organiser | Professor Jörg Blasius (University of Bonn) |
Time | Thursday 18th July, 14:00 - 15:30 |
Room | D11 |
This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the “substantive” solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings.
Keywords: Quality of data, task simplification, response styles, satisficing
Dr Dimitris Pavlopoulos (Vrije Universiteit Amsterdam) - Presenting Author
Ms Paulina Pankowska (Vrije Universiteit Amsterdam)
Dr Daniel Oberski (Utrecht University)
Professor Bart Bakker (Statistics Netherlands)
The aim of this paper is to assess the effect of dependent interviewing on measurement error in the employment contract type of workers in the Netherlands.
Dependent interviewing (DI) is a data collection technique which uses information from prior interviews in subsequent interview rounds. Longitudinal surveys often rely on this method to reduce respondent burden and achieve higher longitudinal consistency of responses. The latter is also supposed to reduce (random) measurement error. However, DI has also been shown to lead to cognitive satisficing. That is, when confronted with their previous responses, interviewees are tempted to confirm that no changes had occurred and that the answers provided previously still hold. Such behavior increases the probability of obtaining systematic measurement error: if a respondent made an error in the first interview round it is highly likely that this error will be carried over to subsequent rounds.
The study of measurement error in longitudinal categorical data is typically done with a special family of Latent Class models- Hidden Markov Models (HMMs). However, studying non-random, systematic measurement error, such as the one that DI may cause, entails relaxing the local independence assumption of HMMs which requires measurement error to be uncorrelated over time.
Therefore, in our study, we apply an extended, two-indicator HMM to linked data from the LFS and the Dutch Employment Register. The use of two indicators allows relaxing the independence assumption while maintaining model identifiability; this enables modelling auto-correlated (systematic) errors in the LFS. We use data from periods during which it was used fully or partially as well as time periods during which it was not used. Our results show that the overall effect of DI is negligible; while, in line with theory, it lowers random error but increases systematic errors, none of those effects is significant.
Ms Paulina Pankowska (Vrije Universiteit Amsterdam) - Presenting Author
Dr Dimitris Pavlopoulos (Vrije Universiteit Amsterdam)
Dr Daniel Oberski (Utrecht University)
Researchers from many disciplines often employ a variety of clustering techniques, such as K-means, DBSCAN, PAM, Ward, and Gaussian mixture models (GMMs), in order to separate survey data into interesting groups for further analysis or interpretation.
Surveys, however, are well-known to contain measurement errors. Such errors may adversely affect clustering - for instance, by producing spurious clusters, or by obscuring clusters that would have been detectable without errors. Furthermore, measurement error might reduce intra-cluster homogeneity and lower the degree of inter-cluster separation. Yet, to date, the concrete effects that such errors may exert on commonly used clustering techniques have rarely been investigated. While the few existing studies in the field suggest some adaptations to specific clustering algorithms to make them "error-aware", they focus predominantly on random measurement error and make no mention of systematic errors that may exist in the data. In addition, these studies often assume that the extent of measurement error is known a priori, an assumption which is rarely-fulfilled in practice.
In our simulation study, we investigate the sensitivity of commonly used model- and density-based clustering algorithms (i.e. GMMs and DBSCAN) to differing rates and magnitudes of random and systematic measurement errors. We look at the effects of the error on the number of clusters and the similarity of these clusters to the ones obtained in the absence of measurement error. Our analysis shows that, when only one variable is affected, random error substantially biases the clustering results only in rather extreme scenarios, while, even a moderate level of systematic error leads to a significant bias. When all (three) variables contain measurement error, though, both types of error lead to non-ignorable bias. We also find that overall GMM results are more robust to measurement error than DBSCAN.
Mr Michael Blohm (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Ms Oshrat Hochman (GESIS - Leibniz Institute for the Social Sciences)
Mr Sebastian Stier (GESIS - Leibniz Institute for the Social Sciences)
Ms Jessica Walter (GESIS - Leibniz Institute for the Social Sciences)
The aim of data collection is to get an understanding of what individuals think about different topics. In order to do this as best as we can, we need to reduce survey errors. One potential component of an error is associated with information respondents gather from their immediate environment during the data collection period. A substantive variation in the exposure to different topics in the mass media might result in a variation of the political knowledge in a society. We examine the influence of information that was present in the mass media during the data collection period on respondents’ answers to political knowledge items. We discuss how responses to political knowledge items in a population are associated, across the data collection period, with the prevalence of political topics in the mass media.
We use political knowledge items and individual characteristics of the respondents collected using a CASI-mode for the German General Social Survey as well as media data (scraped from the online presences of the most important newspapers, political magazines and public broadcasters) parallel to the field. Over a data collection period of 25 weeks, we examine if the degree of political knowledge in the population remains stable over time or if changes in the share of correct answers varies with the salience of related topics in mass media. Besides individual characteristics, like socio-demographics, political interest and media- and internet usage we analyze whether the level of difficulty of the political knowledge questions has an influence on the changes of knowledge over time.
Dr Vida Beresneviciute (EU Agency for Fundamental Rights (FRA))
Dr Rossalina Latcheva (EU Agency for Fundamental Rights (FRA)) - Presenting Author
The FRA’s second Survey on discrimination and hate crime against Jews is an important evidence base, providing a wealth of information on the prevalence of antisemitism across the EU. The survey was conducted in 2018 in 13 EU Member States (Austria, Belgium, Denmark, France, Germany, Hungary, Italy, Latvia, the Netherlands, Poland, Spain, Sweden and the United Kingdom) with a total of 16,660 survey completions. The first survey was conducted in 2012 to address the lack of comparable evidence on the experiences of Jewish people in relation to antisemitism, hate crime and discrimination. The first survey filled an important gap in knowledge about the everyday experiences of Jewish people in nine EU Member States. The aim of the second study is to build on the findings of the 2012 survey and help understand how these issues have changed over time given the changing climate in Europe. The survey was delivered, on behalf of FRA, through a consortium partnership between Ipsos UK and the Institute for Jewish Policy Research (JPR).
The approach involved using an open web survey with agnostic design that was sent out to people via Jewish communal organisations, groups and media outlets and involved an extensive programme of community engagement to ensure that the survey had as wide a reach as possible. Given the opt-in online approach and the lack of comprehensive Jewish population statistics in some countries, there are limits concerning the extent to which the quality of the sample can be assessed. Looking at both surveys, the presentation will critically reflect upon methodology, comparability and weighting issues, and put a special focus on difficulties and possible solutions for problems researchers face when having to rely on non-probability sampling methods.