Measurement Invariance: Testing for it and Explaining Why It Is Absent 2 |
|
Chair | Dr Katharina Meitinger (GESIS Leibniz Institute for the Social Sciences ) |
Coordinator 1 | Professor Eldad Davidov (University of Cologne and University of Zurich) |
This study applies the Alignment Method (Muthén and Asparouhov, 2014) to assess survey measurement comparability in the assessment of attitudes toward immigrants’ rights across countries, cohorts and genders. In a previous analysis of attitudes toward immigration among European adults using data from the ESS study, Billiet and Meuleman (2012) assess cross-cohort comparability of their measures by using multiple-group confirmatory factor analysis (MGCFA). In a series of confirmatory factor analysis testing for configural, metric and scalar invariance they present a stepwise, partly data- driven procedure guided by modification indices using measurement invariance tested with LISREL (Jöreskog 1971; Jöreskog and Sörbom 1993). In the analysis they adopted a ‘bottom-up’ test strategy starting with the weakest level of invariance, configural invariance, imposing invariance constraints one item at a time. As pointed out by Davidov, Meuleman, Cieciuch, Schmidt, and Billiet (2014), this approach is cumbersome and prone to resulting in a wrong model (due to the reliance on data-driven modification indices), especially when there are many groups being compared. The Alignment method offers new directions and solutions for the assessment of measurement equivalence (Davidov et al. 2014). This approach starts with the configural model in a MGCFA with no invariance, and attempts to find as much invariance as possible by letting the factor means and variances vary across groups (Muthén and Asparouhov, in press). This makes it possible to build measurement invariance analysis on the less restrictive configural model rather than struggling for scalar equivalence. The measurement model can then be selected on theoretical grounds which are preferable to using data-driven step wise procedures to come up with a useful measurement model.
The Alignment method will be applied to two data collections of the IEA surveys of civics and citizenship education: the 1999 Civic Education Study, and the 2009 International Civics and Citizenship Education Study. The full scale illustration analyzed the 1999/2009 dataset with responses to five Likert scale items measuring Support of Immigrants’ Rights, in all approximately 80,000 European native born 14-year-olds from 28 countries. We examined measurement invariance across a 92 group design (country by cohort by gender) which shows that the scale is statistically well-grounded for unbiased group comparisons despite the presence of non-invariance (scalar model RMSEA = 0.097). The effects of the badness-of-fit of the scalar model are scrutinized by post processing analysis of correlations and ranking lists comparing the aligned score with the factor score achieved from applying the scalar model. In the aligned score the misfit could be located to just a few groups; specifically, female students from Cyprus in the 1999 study and female students from Latvia in the 2009 study, However, there were only marginal effects when groups with more severe degrees of misfit were kept into the reported results. Additional analysis to be presented will focus on potential explanations for observed misfit. Overall, the alignment methodology makes it feasible to comprehensively assess measurement invariance in large datasets,
When comparing multiple groups it is important to establish measurement invariance (MI), meaning that the latent construct under investigation is measured in the same way across groups. Traditionally, MI is tested using multiple group confirmatory factor analysis (MGCFA) with certain restrictions on the model. The goal is often to attain scalar invariance, which sets the loadings and intercepts equal across groups, so that factor means can be meaningfully compared. In practice, however, scalar invariance is often an unattainable ideal. Therefore, several alternative methods have been proposed to test for MI, such as partial MI, Bayesian approximate MI, and the alignment method. Although these techniques relax the restrictions imposed by the scalar invariance model, they do impose specific assumptions about the underlying structure of MI. Both the alignment method and approximate MI assume many small deviations from invariance, while partial MI requires at least two invariant items.
In this presentation, the different methods for MI will be unified by considering them as specific regularization approaches. Regularization methods (e.g. lasso, ridge) are popular in sparse regression problems where the number of predictor variables is (much) greater than the number of observations. Traditionally, these approaches minimize a loss function subject to a norm constraint or penalty on some parameters, where different norm constraints lead to different shrinkage behaviors. We will show how the problem of MI resembles the sparse regression problem and how the existing methods for MI relate to regularization approaches.
We adopt a Bayesian approach, which provides more flexibility. Bayesian analysis combines the likelihood of the data with a prior distribution to obtain a posterior distribution that is used to make inferences. It has been shown that, under certain prior distributions, the mode of the posterior distribution corresponds to popular regularization approaches. Employing this Bayesian regularization framework therefore allows us to 1) unify the existing methods for MI and 2) extend the current toolbox by considering different priors. Specifically, we will consider prior distributions that are less stringent in their assumptions about the structure of MI, thereby allowing to model additional forms of MI. Several penalties and their corresponding prior distributions will be discussed in relation to MI and their behavior will be investigated through multiple illustrations. Finally, we will provide recommendations on how to choose between the different possible prior distributions.
Cross-cultural research is steadily gaining relevance in empirical criminology. However, lack of awareness about the consequences of measurement non-equivalence in key concepts is still prevalent. To address this shortcoming, we aim at testing self-control as one of the most prominent explanatory factors in crime causation (see e. g. Gottfredson & Hirschi 1990) for measurement invariance (MI) across different cultural settings. Specifically, we are planning to apply three different approaches to test shortened 12- and 9-item versions of the self-control scale proposed by Grasmick et al. (1993) as used in the international self-report delinquency study (ISRD): (i) confirmatory factor analyses based on classical test theory, (ii) item repsonse theory (IRT) methods (within IRT, the MI-issue is better known as differential item functioning DIF), and (iii) Procrustes rotation toward the factor structure of a reference group and subsequent comparisons of congruence measures (linearity, proportionality, additivity, and identity). As database serve the second and third waves of the ISRD-study, conducted in 29 resp. 27 European, American, and Asian countries. The ISRD project surveys 12-16 year old juveniles in school classes, representative to the cities or regions of the participating countries, by using self-administered questionnaires. To ensure analytical feasibility, we confine the data to countries grouped according to the different cultural spheres according to language, indices of development and welfare regime, as well as geographic location. Besides testing for MI, we will draw our attention to the empirical consequences of measurement non-equivalence for the explanation of cross-cultural differences in crime rates if the MI-assuptions are violated.
Several repeated cross-national surveys include measurements of attitudes towards gender roles aimed to investigate individuals´ beliefs regarding the appropriateness of men and women´s roles in a certain context. When used to compare attitudes across countries, it should be noted that these measurements present critical aspects which could cause the lack of equivalence between different cultural contexts and therefore to misleading results.
In addition to the methods bias that could occour in cross-national data collections, mainly due to translation mistakes, modes of data collections, differences in the sampling procedures, as well to social desiderability and acquiescence that can vary by cultural context (Heath et al., 2009; van de Vijver & Tanzer, 2004), the measurement equivalence of gender role attitudes appears particularly sensitive to construct bias. This is because different ways of defining gender roles are established across cultural contexts (Constantin & Voicu, 2014; Lomazzi, 2016). Institutional factors as welfare regimes, religious traditions, or the labor market dynamics, historically contributed to the development of different gender cultures across societies, prescribing gender roles accordingly (André et al., 2013; Pfau-Effinger, 2004; Sjöberg, 2004). This reflects not only in shaping gender beliefs, but also in the meaning given to questions investigating these concepts (Braun, 1998, 2009; Braun et al., 1994).
Regardless these potentially critical aspects, the use of these measurements in comparative studies is quite widespread and only recent studies introduced the evaluation of the quality of the measurement instruments in this field: Constantin and Voicu (2014) tested the gender role scale included in ISSP 2002 and WVS 2005, while Weziak-Bialowolska (2015) evaluated a measure of gender equality based on a combination of different items from WVS 1994 that are not originally expressed as a scale.
Informed by the most recent development of the assessment of measurement invariance (Asparouhov & Muthén, 2009, 2014; Cieciuch et al., 2014; Van De Schoot et al., 2013; Davidov et al., 2015), this paper aims to test the measurement equivalence of the gender role attitudes scale included in the sixth wave of the World Values Survey (2010-2014) in 58 countries.
As more approximate procedures allow to estimate means and variances without constraining loadings and intercepts to be equal as the ´exact´approaches do, these new approaches could be helpful in assessing the measurement invariance of gender role attitudes. In particular here, after testing country by country if the model fits the data, I will employ the alignment optimization (Asparouhov & Muthén, 2014), eventually combined with the Bayesian approach, in order to assess measurement equivalence. However, considering the recent appearence of these approaches, the results will be compared to the outputs obtained adopting the traditional “exact” approach (MFGCA) to enlighten possible advantages of the novel procedure.