ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Wednesday 19th July, 16:00 - 17:30 Room: Q2 AUD1 CGD


Assessing the Quality of Survey Data 3

Chair Professor Jörg Blasius (University of Bonn )

Session Details

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the “substantive” solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings.

Paper Details

1. Models for wording effects on the Rosenberg self-esteem scale, their nomological network and implications for scoring
Dr Michalis Michaelides (University of Cyprus)
Ms Chrystalla Koutsogiorgi (University of Cyprus)
Dr Georgia Panayiotou (University of Cyprus)

Response styles in self-report surveys introduce systematic, construct-irrelevant variance. The inclusion of negatively worded items was recommended as a remedy of acquiescence. The strategy was based on the assumptions that positively and negatively worded items measuring the same construct would be interchangeable, and that the negatively worded items would engage participants in more careful responding. However, evidence from factorial validity studies across scales suggests that reversely worded items influence the factorial structure by forming separate factors driven by methodological rather than conceptual variability.

Rosenberg’s Self-Esteem Scale (RSES) is a popular measure of global self-esteem. It is a balanced scale with 5 positively and 5 negatively worded items, designed to be unidimensional. However, research has repeatedly illustrated a non-unidimensional factorial structure that is contaminated by method effects due to item wording. Recently, bifactor modeling - a substantive self-esteem factor and 2 additional method factors linked to the positively and negatively worded items - has been applied on various samples with good fit. The present research aimed to investigate (a) the factorial structure of a Greek adaptation of RSES, examining 8 competing confirmatory factor analytic models, (b) the nomological network of the method effects, utilizing traits related to behavioral inhibition, and (c) differences in correlations of self-esteem with external variables manipulating the way self-esteem was modeled.

Two different samples from Cyprus were utilized: a sample of 205 young adults and a more diverse community sample of 144 adults. The bifactor model that accounted for both positively and negatively items had better fit compared to alternative models in both samples. Moreover, measures of experiential avoidance, social anxiety and private self-consciousness were associated with the method effects. Regarding the third aim, differences in correlation of self-esteem with external variable were small irrespective of how self-esteem was modeled.

The findings illustrated that the bifactor model described the data well and enhanced the interpretation of the factorial structure of the RSES. The specification of method effects seemed to be a more accurate representation of the scale’s factorial structure. However, for the purposes of extracting scores from the RSES, using raw data composites versus latent scores with or without controlling for method effects did not lead to differential correlations with external variables; therefore applied researchers and clinicians need not worry about complex modeling or alternative scoring procedures for the RSES.

More complex models can nevertheless be useful for identifying and controlling methodological effects that bias responses, for a more accurate representation of the factorial structure. Bifactor modeling has enhanced the interpretation of factorial structures by resolving issues of dimensionality that often arise with psychological scales. It permits the psychometric examination of item responses that do not behave as strictly unidimensional. Finally, when methodological effects are specified as group factors, they can be examined as stable, reliable response tendencies with predictable associations to improve our understanding about sources of variance that come into play in self-reports.


2. Interviewer effects in real and falsified survey data
Mrs Uta Landrock (University of Kaiserslautern)

In face-to-face interviews the interviewer plays a central role. He/she can, for example, convince respondents to participate in the survey. But there is also the risk that the interviewer does not follow the instructions, which may lead to interviewer effects: Interviewer effects are distortions of survey responses due to the presence of an interviewer which can cause biased data and affect substantive findings. This paper analyzes interviewer effects in real survey data and in data falsified by interviewers. We use (quasi-)experimental data consisting of three data sets. For the first dataset our interviewers conducted 710 real face-to-face interviews. For the second dataset the same interviewers fabricated survey data in the lab: The falsifying interviewers were briefed about the sex, age, place of residence, and other socio-demographic characteristics of the 710 real survey participants. Then they were instructed to fill in the questionnaire like a person with the given characteristics would probably do. Additionally the interviewers filled in the survey questionnaire for themselves, as respondents. These data are stored in a third data set.
We estimate multi-level analyses, separately with real and falsified data, to identify interviewer effects on substantive findings. We choose dependent variables which are known to be prone to interviewer effects. These include, among others, the income as sensitive (and open) question and the political participation, measured with a list of 12 political activities. The explaining variables on the respondents’ level in the case of income are age, gender, and living situation. In the case of political participation these are, for example, the political efficacy and the political and economic dissatisfaction. The independent variables on the interviewers’ level are variables which are known to cause interviewer effects, for example the interviewers’ gender, the interviewers’ experience and the payment scheme (payment per hour vs. per interviews). We also consider the interviewers’ personality traits like extraversion, conscientiousness or perceived self-efficacy as relevant for analyzing interviewer effects. Additionally we include the interviewers’ answer to the same question of the questionnaire.
As preliminary results we can report: 1) In the real data we do not find evidence for interviewer effects (max ICC: .025). 2) In the falsified data we can find strong interviewer effects (max ICC: .218). 3) In the case of falsified data we find significant effects of the interviewers’ answer to the same question of the questionnaire and of single other characteristics of the interviewer. These results lead to the recommendation to collect as much information as possible (and feasible) about the interviewer. For example, as part of the interviewer training, the interviewers could answer the survey questionnaire. On the one hand the interviewers get to known the questionnaire and on the other hand we get to known the interviewers. In suspicious cases the presented multi-level analyses can be conducted to check data for interviewer effects, which may indicate interviewer falsifications.


3. Response Quality and Ideological Dispositions
Ms Alice Barth (University of Bonn)
Mr Andreas Schmitz (University of Bonn)

Social science surveys are prominently referred to as a means of representing public opinion. In this context, the data’s scientific mode of production suggests that respondents’ attitudes and opinions are captured in a neutral and unbiased way. Accordingly, respondents’ differential reactions to being surveyed (such as response styles, item non-response, satisficing, or refusal) are usually not seen as substantive information, but rather treated as technical problems or emanations of individual cognitive factors. This perspective entails that research mainly concentrates on singular aspects of response quality, which are explained by individual variables such as age, education or personality. It is seldom discussed, however, whether the assumption of the ideological neutrality of standardized surveys actually holds true. Factors jeopardizing this assumption might include, among others, biases in the selection of topics and question wording arising from institutions’ and actors’ positions in the political field, or respondents’ differential evaluation of official surveys.
In our presentation, we argue that the assumption of the survey instrument as being a neutral means of representing public opinion misses the point that the technical quality of response data is not independent of respondents’ general world view and, more specifically, their ideological disposition. Data quality inevitably reflects respondents’ understanding of the available categories, as well as their perception of survey research in general. This consideration implies that on the one hand the investigation of singular aspects of response quality does not adequately capture the complexity of reactions towards surveys, as response practices may be constituted by the interrelation of multiple response biases. On the other hand, the explanation of differences in response quality by variables such as age, education or personality falls short of taking into account the ideological distance between the instrument and the respondent. Taken together, research has largely ignored the possibility of survey response patterns as a consequence of a respondent’s ideological background.
Using data from large-scale international surveys, we first assess the interrelations of multiple indicators of non-substantial responding (acquiescence, item non-response, midpoint-responding etc.) by applying finite mixture models. We then relate the resulting classes to spatial positions via geometric data analysis, a technique that allows to empirically construct the structure of a society and its sub-spheres such as the political field. Our findings imply that there are (1) different combinations of response patterns and thus different empirical types, and (2) that these types systematically vary by the respondents’ (dis)positions. We conclude with a reflection on the widespread assumption that survey instruments are equally valid for all population strata.


4. Old Friends Re-Visited: What Do We Know about the Golden Rules of Questionnaire Design?
Dr Natalja Menold (GESIS)

Double barreled questions, negations and hypothetical questions are regarded by standard books on questionnaire design as problematic, with strong advises to better avoid them in the questionnaires. However, in surveys, we can often find questions which are problematic in this regard, but function “good” in the sense of measurement models or reliability metrics. This presentation addresses empirical evidence available so far on the effect of the mentioned aspects of question wording on measurement quality. While there is some research on several aspects, such as negations, there is a lack on empirical research with respect to the double barreled and hypothetical questions. The effect of double barreled questions on measurement quality with regard to response sets, reliability and validity is presented on the example of an experimental study.