ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Wednesday 19th July, 16:00 - 17:30 Room: Q4 ANF2


Measurement Invariance: Testing for it and Explaining Why It Is Absent 3

Chair Dr Katharina Meitinger (GESIS Leibniz Institute for the Social Sciences )
Coordinator 1Professor Eldad Davidov (University of Cologne and University of Zurich)

Session Details

Measurement invariance tests are a popular approach to assess the cross-national comparability of data. However, researchers often have difficulties to establish the highest level of measurement invariance, scalar invariance (Davidov et al. 2012).

In recent years, the predominant approach to “fix” this issue is to opt for more statistical sophistication and relaxing certain requirements when testing for measurement invariance. Approaches, such as Bayesian structural equation modelling (BSEM) (Muthén and Asparouhov 2012; van de Schoot 2015) or alignment (Asparouhov and Muthén 2014) fall in this category.

However, these approaches cannot provide reasons as to why measurement invariance cannot be found. An alternative approach in this context is to view the lack of measurement invariance as a source of information on cross-group differences and to try explaining the individual, societal, or historical sources of measurement nonequivalence (Davidov et al. 2014). On the one hand, quantitative approaches—such as the multiple indicators multiple causes model (MIMIC) (Davidov et al. 2014) and the multilevel structural equation models (MLSEMs) (Davidov et al. 2012)—aim to substantively explain cases of noninvariace. On the other hand, there is an increasing awareness of the potential of mixed methods approaches to explain instances of measurement invariances (e.g., Latcheva 2011; Panyusheva & Efremova 2012; Meitinger 2016). These studies mostly use results from cognitive interviewing or online probing to explain why measurement invariance was not found. In contrast to the purely quantitative approaches, the mixed method approaches often reveal previously unknown and surprising causes for the incomparability of data.

This session aims at presenting studies that either test for measurement invariance or examine the reasons why tests for measurement variances failed in certain research situations. We welcome (1) presentations that take a purely quantitative approach to test measurement invariance or explain non-invariance, and (2) presentations which apply a mixed method approach to explain instances of missing measurement invariance.

Paper Details

1. The Middleton Alienation Scale: Explaining Measurement Invariance Absence
Ms Ekaterina Lytkina (National Research Univeristy Higher School of Economics)

The project dwells with explaining measurement invariance absence across countries. I revisited a highly institutionalized alienation scale originally introduced by Middleton (1963), which was treated as a scale either measuring alienation (Brannen & Peterson, 2009; Seeman, 1975) or anomie (Austin & Stack 1988; Huschka and Mau 2005, 2006). However, the theoretical arguments suggest possibility of existence of two different models. According to the first one, as proposed by Seeman (1959), alienation comprises five characteristics, namely powerlessness, meaninglessness, isolation, normlessness and self-estrangement (with the latter one drawing upon Marx substituted in later versions with "job dissatisfaction"). The second one suggests that the scale measures two different phenomena: "anomie", standing for feelings of normlessness in connection to the low level of external locus of control / powerlessness (Merton, 1968), and "alienation" itself, comprised by its cognitive components of loss of social networks, meaning of life and with job dissatisfaction (e.g. Srole 1956, Dean. 1961).
I used the data from the World Values Survey (2011, Russia (N=2500), and Kazakhstan (N=1500)) and from the Euromodule research project (Slovenia (1999), Germany (1999), Hungary (1999), Spain (2000), Switzerland (2000), Austria (2002), Turkey (2001-2002), and South Korea (2001-2002)). Applying confirmatory factor analysis and multigroup comparisons, I found that the scale functions differently across countries. For Russia and Kazakhstan, full metric invariance was achieved in the case of either a one and two-factor models, with the latter having higher models fits and less then 50% of common variance between the factors. For none of the two models full metric invariance was found for the whole Euromodule database. Given the exploratory factor analysis results followed by confirmatory factor analysis and multigroup comparisons, the two-factor model was preferable in Slovenia and Switzerland, whereas for Germany, Austria, Turkey, South Korea the one-factor model was applicable. Moreover, unlike the WVS data, powerlessness and normlessness in other than post-Communist countries had small factor loadings (<0.35). In comparison to other countries, in South Korea normlessness had negative correlations with other indicators. Further, I also check for discriminant validity for models applied.
The explanations for measurement invariance absence I address in the presentation are cross-country differences, questions order (different in the two datasets), and existence of a reverse-coded additional item in the Euromodule project.


2. Do mode effects matter in cross-national surveys? An assessment of measurement invariance across data collection methods and countries
Dr Michèle Ernst Stähli (FORS, Swiss Centre of Expertise in the Social Sciences)
Dr Oriane Sarrasin (University of Lausanne)
Professor Caroline Roberts (University of Lausanne)

Mixed mode data collection is becoming increasingly popular in survey research in response to pressure to reduce fieldwork budgets. Mode choice in survey design can influence who is able to participate in a survey (coverage); who chooses to participate (nonresponse); and how respondents answer questions (measurement), meaning that the structure of errors affecting an estimate varies as a function of how the data were collected. The use of a combination of modes can, therefore, potentially hinder analysts’ interpretation of differences between subgroups of interest, if selection effects are confounded with measurement effects. In multi-nation studies, the use of different modes in different countries is similar to the use of multiple modes in a single-nation study in terms of its confounding effect on the comparability of estimates across subpopulations. Where multiple modes are envisaged both within and between countries, the question of how to assess measurement equivalence and manage the additional data complexity poses an important challenge for the field of comparative survey methodology.

The literature on mixed mode data collection has predominantly been focused on whether differences can be observed between modes of data collection, yet a more relevant question for data users concerns how important differences between modes are in terms of their potential impact on point estimates, and the relationships between variables of interest. This question of whether mode effects ‘matter’ has remained largely unanswered, partly because of the technical complexity of disentangling measurement differences from selection effects, which makes simply detecting mode effects problematic to begin with. One solution appears to lie in a total survey error approach, however, which attempts to evaluate the relative amount of error from different sources in survey estimates. In a comparative context, for example, it makes sense to ask whether the impact of differences between modes is greater or smaller than the impact of other methodological differences between countries that may affect measurement equivalence.

In this paper, we attempt to address this challenge by presenting the results of an analysis of measurement invariance across countries and across modes of data collection using Confirmatory Factor Analysis. Using data collected as part of a methodological experiment conducted during round 3 of the European Social Survey, we will assess the implications of mode choice (face-to-face versus telephone surveys) on measurement equivalence of multi-item measures of subjective wellbeing in four participating countries (Germany, Hungary, Poland, Switzerland).


3. Exploring language effects in cross-cultural survey research: Does the language of administration affect answers about politics?
Miss Diana Zavala-Rojas (Universitat Pompeu Fabra)

We study if the language of administration of a survey has an effect in the answers of bilingual respondents to questions measuring political dimensions. This is done in two steps. In the first we test whether the measurement instruments are equivalent for a same individual in two languages. After measurement invariance is established, we tested if latent mean differences are significant across the two languages. We also test if the correlation of a same concept in two languages is equal to one or not. Results show evidence for language effects, the latent correlation was below one, although mean differences were not significant. We use data of the LISS migration panel in a within subject design, respondents answered a questionnaire twice first in Dutch and then in their (second) language among Arabic, English, German, Papiamento and Turkish. Studying bilingual samples is important to identify language effects. Implications of the results are presented: how to decide which language should be administered in a survey to migrant populations.


4. Causal equivalence of moderator effects of attitude accessibility in comparative international studies
Mr Henrik Andersen (Technische Universität Kaiserslautern )
Mr Christoph Giehl (Technische Universität Kaiserslautern)
Dr Jochen Mayerl (Technische Universität Kaiserslautern)

Survey response latencies are used to gain a better understanding of cognitive processes and measurement errors (cf. Kreuter 2010). With this paper, we hope to gain a better understanding of how paradata can be used in international comparative studies and to identify possible difficulties surrounding its application in international studies.
We look at the use of response latencies as a proxy for attitude strength in attitude-models and the appropriateness of their usage in international comparative studies. The research question we hope to answer with this study is: Are response latencies a valid proxy for attitude strength in international comparative studies?
We test a causal model of attitudes towards military interventions for measurement equivalence across five European countries. The model is then elaborated on by testing the moderating effect of attitude strength and whether the causal equivalence is given across countries. Furthermore, we look at the use of response latencies as a proxy for attitude strength and the appropriateness of their usage in comparative analyses by testing a moderator effect in the relationship between specific and general attitudes in the five different countries of analysis.
The causal model involves predicting respondents’ latent attitudes towards the NATO-led International Security Assistance Force in Afghanistan (ISAF) using generalized attitudes towards aspects of the federal militaries in the respective countries, specifically attitudes towards various foreign activities such as reconstruction work, democracy building and the combatting of international terrorism. The causal model is based on a hierarchal model of generalized and specific attitudes (cf. Ajzen and Fishbein 1980, Rokeach 1968).
We use data collected in a project in collaboration with the Universität der Bundeswehr München (University of the Armed Forces in Munich) to look at attitudes towards the mission in Afghanistan and their determinants. The data was collected over the course of September-October 2016 in a CATI survey with random samples of citizens of Germany, Italy, Poland, the Netherlands and the UK.
To test the measurement equivalence of the model constructs, a multiple group confirmatory factor analysis is carried-out. Following that, the causal equivalence of the model is tested with multiple group structural equation modelling looking specifically at the cross-national comparison of moderation power of attitude strength. Response latencies, as a proxy for attitude strength towards the model constructs, was measured using passive time-measurement for all items in the survey. The raw reaction times are adjusted for baseline speed, question characteristics, interviewers and other factors. The stability of the findings will be investigated by comparing results of different levels of measurement equivalence.
With this paper, we wish to contribute to a better understanding of the use of response latencies in international contexts to improve overall measurement quality in international comparative studies.
References
Ajzen, I.; Fishbein, M. (1980): Understanding Attitudes and Predicting Social Behavior. Englewood Cliffs: Prentice-Hall.
Rokeach, M. (1968): Beliefs, Attitudes, and Values. San Francisco, CA: Jossey-Bass.
Kreuter, Frauke (2010): Paradata: Previous Developments and Recent Discussions. In: RatSWD, Working Paper Nr. 136.


5. Assessing Political Efficacy comparability: measurement invariance and correction for measurement error
Dr Andre Pirralha (Universitat Pompeu Fabra)
Dr Wiebke Weber (Universitat Pompeu Fabra)

The concept of political efficacy has played an important role in studies of political behaviour. Since the seminal studies of Campbell, Gurin and Miller (1954) and Campbell, Converse, Miller and Stokes (1960), the political efficacy construct has been regarded as an important predictor of political participation (Abramson and Aldrich, 1982) and as an outcome of participation (Finkel, 1985). In the sake of democracy’s stability, high levels of efficacy among citizens are regarded as desirable. Individuals that are confident about their ability to influence the actions of their government are more likely to support the democratic system. It was Easton’s (1965) work that first integrated the construct of political efficacy into the concept of political support. However, it was also soon revealed that political efficacy is based in a two-dimensional structure: internal efficacy or subjective competence, which can be defined as the confidence of the individual in his or her own abilities to understand politics and to act politically, and external efficacy or system responsiveness, the individual’s belief in the responsiveness of the political system (Lane 1959; Converse 1972; Balch 1974). More recently, it has been shown that these two dimensions are very different and carry different relationships with other variables. However, this theoretical debate has implicit the assumption that measurement of political efficacy is equivalent across different cultural contexts. But is that really the case?
The most common way to prove it is testing for measurement equivalence. Over the last decades, different data analysis tools have been developed to assess measurement equivalence. Even though it is widely known that measurement error can seriously attenuate the relationships between variables, most previous work on measurement equivalence has only focus on random error and largely ignore the potential effect of systematic error, including method effects. This is an important issue because it is often claimed that equivalence analysis is too strict without mentioning that some of the differences that determine a group to be not invariant can be corrected in the measurement process. The main reason not to address the correction for measurement error issue is the difficulty in obtaining the quality estimates for the indicators. However, with SQP the quality estimates are now easily available which makes correction for measurement error considerably easier.
This paper has two goals. First we aim to assess measurement equivalence of the political efficacy construct. Second, our goal is to determine the impact of full correction for measurement error on measurement equivalence analysis. For this purpose, we use data from the European Social Survey 2014 Round 7. The results will help to clarify if political efficacy can be compared across European countries and whether equivalence analysis after correction for measurement error is another viable avenue to ensure the maximum comparability between countries and groups.