Measurement Invariance: Testing for it and Explaining Why It Is Absent 3 |
|
Chair | Dr Katharina Meitinger (GESIS Leibniz Institute for the Social Sciences ) |
Coordinator 1 | Professor Eldad Davidov (University of Cologne and University of Zurich) |
The project dwells with explaining measurement invariance absence across countries. I revisited a highly institutionalized alienation scale originally introduced by Middleton (1963), which was treated as a scale either measuring alienation (Brannen & Peterson, 2009; Seeman, 1975) or anomie (Austin & Stack 1988; Huschka and Mau 2005, 2006). However, the theoretical arguments suggest possibility of existence of two different models. According to the first one, as proposed by Seeman (1959), alienation comprises five characteristics, namely powerlessness, meaninglessness, isolation, normlessness and self-estrangement (with the latter one drawing upon Marx substituted in later versions with "job dissatisfaction"). The second one suggests that the scale measures two different phenomena: "anomie", standing for feelings of normlessness in connection to the low level of external locus of control / powerlessness (Merton, 1968), and "alienation" itself, comprised by its cognitive components of loss of social networks, meaning of life and with job dissatisfaction (e.g. Srole 1956, Dean. 1961).
I used the data from the World Values Survey (2011, Russia (N=2500), and Kazakhstan (N=1500)) and from the Euromodule research project (Slovenia (1999), Germany (1999), Hungary (1999), Spain (2000), Switzerland (2000), Austria (2002), Turkey (2001-2002), and South Korea (2001-2002)). Applying confirmatory factor analysis and multigroup comparisons, I found that the scale functions differently across countries. For Russia and Kazakhstan, full metric invariance was achieved in the case of either a one and two-factor models, with the latter having higher models fits and less then 50% of common variance between the factors. For none of the two models full metric invariance was found for the whole Euromodule database. Given the exploratory factor analysis results followed by confirmatory factor analysis and multigroup comparisons, the two-factor model was preferable in Slovenia and Switzerland, whereas for Germany, Austria, Turkey, South Korea the one-factor model was applicable. Moreover, unlike the WVS data, powerlessness and normlessness in other than post-Communist countries had small factor loadings (<0.35). In comparison to other countries, in South Korea normlessness had negative correlations with other indicators. Further, I also check for discriminant validity for models applied.
The explanations for measurement invariance absence I address in the presentation are cross-country differences, questions order (different in the two datasets), and existence of a reverse-coded additional item in the Euromodule project.
Mixed mode data collection is becoming increasingly popular in survey research in response to pressure to reduce fieldwork budgets. Mode choice in survey design can influence who is able to participate in a survey (coverage); who chooses to participate (nonresponse); and how respondents answer questions (measurement), meaning that the structure of errors affecting an estimate varies as a function of how the data were collected. The use of a combination of modes can, therefore, potentially hinder analysts’ interpretation of differences between subgroups of interest, if selection effects are confounded with measurement effects. In multi-nation studies, the use of different modes in different countries is similar to the use of multiple modes in a single-nation study in terms of its confounding effect on the comparability of estimates across subpopulations. Where multiple modes are envisaged both within and between countries, the question of how to assess measurement equivalence and manage the additional data complexity poses an important challenge for the field of comparative survey methodology.
The literature on mixed mode data collection has predominantly been focused on whether differences can be observed between modes of data collection, yet a more relevant question for data users concerns how important differences between modes are in terms of their potential impact on point estimates, and the relationships between variables of interest. This question of whether mode effects ‘matter’ has remained largely unanswered, partly because of the technical complexity of disentangling measurement differences from selection effects, which makes simply detecting mode effects problematic to begin with. One solution appears to lie in a total survey error approach, however, which attempts to evaluate the relative amount of error from different sources in survey estimates. In a comparative context, for example, it makes sense to ask whether the impact of differences between modes is greater or smaller than the impact of other methodological differences between countries that may affect measurement equivalence.
In this paper, we attempt to address this challenge by presenting the results of an analysis of measurement invariance across countries and across modes of data collection using Confirmatory Factor Analysis. Using data collected as part of a methodological experiment conducted during round 3 of the European Social Survey, we will assess the implications of mode choice (face-to-face versus telephone surveys) on measurement equivalence of multi-item measures of subjective wellbeing in four participating countries (Germany, Hungary, Poland, Switzerland).
We study if the language of administration of a survey has an effect in the answers of bilingual respondents to questions measuring political dimensions. This is done in two steps. In the first we test whether the measurement instruments are equivalent for a same individual in two languages. After measurement invariance is established, we tested if latent mean differences are significant across the two languages. We also test if the correlation of a same concept in two languages is equal to one or not. Results show evidence for language effects, the latent correlation was below one, although mean differences were not significant. We use data of the LISS migration panel in a within subject design, respondents answered a questionnaire twice first in Dutch and then in their (second) language among Arabic, English, German, Papiamento and Turkish. Studying bilingual samples is important to identify language effects. Implications of the results are presented: how to decide which language should be administered in a survey to migrant populations.
Survey response latencies are used to gain a better understanding of cognitive processes and measurement errors (cf. Kreuter 2010). With this paper, we hope to gain a better understanding of how paradata can be used in international comparative studies and to identify possible difficulties surrounding its application in international studies.
We look at the use of response latencies as a proxy for attitude strength in attitude-models and the appropriateness of their usage in international comparative studies. The research question we hope to answer with this study is: Are response latencies a valid proxy for attitude strength in international comparative studies?
We test a causal model of attitudes towards military interventions for measurement equivalence across five European countries. The model is then elaborated on by testing the moderating effect of attitude strength and whether the causal equivalence is given across countries. Furthermore, we look at the use of response latencies as a proxy for attitude strength and the appropriateness of their usage in comparative analyses by testing a moderator effect in the relationship between specific and general attitudes in the five different countries of analysis.
The causal model involves predicting respondents’ latent attitudes towards the NATO-led International Security Assistance Force in Afghanistan (ISAF) using generalized attitudes towards aspects of the federal militaries in the respective countries, specifically attitudes towards various foreign activities such as reconstruction work, democracy building and the combatting of international terrorism. The causal model is based on a hierarchal model of generalized and specific attitudes (cf. Ajzen and Fishbein 1980, Rokeach 1968).
We use data collected in a project in collaboration with the Universität der Bundeswehr München (University of the Armed Forces in Munich) to look at attitudes towards the mission in Afghanistan and their determinants. The data was collected over the course of September-October 2016 in a CATI survey with random samples of citizens of Germany, Italy, Poland, the Netherlands and the UK.
To test the measurement equivalence of the model constructs, a multiple group confirmatory factor analysis is carried-out. Following that, the causal equivalence of the model is tested with multiple group structural equation modelling looking specifically at the cross-national comparison of moderation power of attitude strength. Response latencies, as a proxy for attitude strength towards the model constructs, was measured using passive time-measurement for all items in the survey. The raw reaction times are adjusted for baseline speed, question characteristics, interviewers and other factors. The stability of the findings will be investigated by comparing results of different levels of measurement equivalence.
With this paper, we wish to contribute to a better understanding of the use of response latencies in international contexts to improve overall measurement quality in international comparative studies.
References
Ajzen, I.; Fishbein, M. (1980): Understanding Attitudes and Predicting Social Behavior. Englewood Cliffs: Prentice-Hall.
Rokeach, M. (1968): Beliefs, Attitudes, and Values. San Francisco, CA: Jossey-Bass.
Kreuter, Frauke (2010): Paradata: Previous Developments and Recent Discussions. In: RatSWD, Working Paper Nr. 136.
The concept of political efficacy has played an important role in studies of political behaviour. Since the seminal studies of Campbell, Gurin and Miller (1954) and Campbell, Converse, Miller and Stokes (1960), the political efficacy construct has been regarded as an important predictor of political participation (Abramson and Aldrich, 1982) and as an outcome of participation (Finkel, 1985). In the sake of democracy’s stability, high levels of efficacy among citizens are regarded as desirable. Individuals that are confident about their ability to influence the actions of their government are more likely to support the democratic system. It was Easton’s (1965) work that first integrated the construct of political efficacy into the concept of political support. However, it was also soon revealed that political efficacy is based in a two-dimensional structure: internal efficacy or subjective competence, which can be defined as the confidence of the individual in his or her own abilities to understand politics and to act politically, and external efficacy or system responsiveness, the individual’s belief in the responsiveness of the political system (Lane 1959; Converse 1972; Balch 1974). More recently, it has been shown that these two dimensions are very different and carry different relationships with other variables. However, this theoretical debate has implicit the assumption that measurement of political efficacy is equivalent across different cultural contexts. But is that really the case?
The most common way to prove it is testing for measurement equivalence. Over the last decades, different data analysis tools have been developed to assess measurement equivalence. Even though it is widely known that measurement error can seriously attenuate the relationships between variables, most previous work on measurement equivalence has only focus on random error and largely ignore the potential effect of systematic error, including method effects. This is an important issue because it is often claimed that equivalence analysis is too strict without mentioning that some of the differences that determine a group to be not invariant can be corrected in the measurement process. The main reason not to address the correction for measurement error issue is the difficulty in obtaining the quality estimates for the indicators. However, with SQP the quality estimates are now easily available which makes correction for measurement error considerably easier.
This paper has two goals. First we aim to assess measurement equivalence of the political efficacy construct. Second, our goal is to determine the impact of full correction for measurement error on measurement equivalence analysis. For this purpose, we use data from the European Social Survey 2014 Round 7. The results will help to clarify if political efficacy can be compared across European countries and whether equivalence analysis after correction for measurement error is another viable avenue to ensure the maximum comparability between countries and groups.