Contemporary Issues in the Assessment of Measurement Invariance 1 |
|
Session Organisers | Dr Daniel Seddig (University of Cologne & University of Zurich) Professor Eldad Davidov (University of Cologne & University of Zurich) Professor Peter Schmidt (University of Giessen) |
Time | Tuesday 16th July, 14:00 - 15:30 |
Room | D23 |
The assessment of the comparability of cross-national and longitudinal survey data is a prerequisite for meaningful and valid comparisons of substantive constructs across contexts and time. A powerful tool to test the equivalence of measurements is multiple-group confirmatory factor analysis (MGCFA). Although the procedures of measurement invariance (MI) testing seem to become increasingly used by applied researchers, several issues remain under discussion and are not yet solved. For example:
(1) Can we trust models with small deviations (approximate MI)? Is partial MI sufficient? How should one deal with the lack of scalar MI, as is the case in many large-scale cross-national surveys?
(2) How to decide whether a model with a high level of MI should be preferred over a model with a lower level of MI? Which fit indices should be used?
(3) Is MI needed anyway and would it be best to start firstly with a robustness calculation?
Recent approaches have tackled the issues subsumed under (1) and aimed at relaxing certain requirements when testing for measurement invariance (Bayesian approximate MI, Muthén and Asparouhov 2012; van de Schoot et al 203) or using the alignment method (Asparouhov and Muthén 2014). Furthermore, researchers addressed the issues subsumed under (2) and recommended the use of particular fit statistics (e.g., CFI, RMSEA, SRMR) to decide among competing models (Chen 2007). The question raised under (3) is a more general one and raises concerns about the contemporary uses of the concept of MI. Researchers (Welzel and Inglehart 2016) have argued that variations in measurements across context can be ignored, for example in the presence of theoretically reasonable associations of a construct with external criteria.
This session aims at presenting studies that assess measurement invariance and/or address one of the issues listed above or related ones. We welcome (1) presentations that are applied and make use of empirical survey data, and/or that (2) take a methodological approach to address and examine measurement invariance testing and use for example Monte-Carlo simulations to study the above mentioned issues.
Keywords: measurement invariance, comparability, cross-cultural research, structural equation modeling
Dr Vera Lomazzi (GESIS-Leibniz Institute for the Social Sciences) - Presenting Author
Dr Daniel Seddig (University of Cologne & University of Zurich)
Differences in societal views on the roles of men and women have been addressed in many large-scale comparative studies during the last years. A prerequisite for valid comparisons of attitudes towards gender roles, however, is that the measures are comparable across countries. Thus, the measurement equivalence must be assessed before drawing substantive conclusions. The current study has three main goals. First, we show that the comparability of gender role attitudes is limited when traditional methods (multiple-group confirmatory factor analysis) are used to test measurement invariance with data from the International Social Survey Programme 2012. However, the recently established alignment optimization procedure suggests that comparability is given. Second, we correlate the national mean levels of gender role attitudes found with the alignment method with cultural values to show that the societal views on the role of men and women vary with respect to the shared goals and views about what is desirable. Societies, which emphasize the importance of the collective and status quo (embeddedness) as well as those with a strong preference in the maintenance of societal roles (hierarchy) tend to show more traditional gender role attitudes. Societies, with more egalitarian values (egalitarianism) also display more egalitarian attitudes towards gender roles. While new methods as the alignment procedure can alleviate the risks of drawing invalid conclusions, the reasons for noninvariance often remain unexplained. The third aim of this study is to investigate the possible sources of noninvariance with multilevel structural equation modeling. We use two country-level variables to explain the absence of measurement invariance: the cultural value embeddedness explains noninvariance to a considerable degree while the Gender Inequality Index (from the UNPD) does not. Thus, the issues of comparability of gender role attitudes are related to cultural rather than structural differences between countries.
Mr Boris Sokolov (Higher School of Economics) - Presenting Author
Measurement invariance is an important prerequisite for cross-cultural studies, since it ensures that latent constructs of interest are comparable across countries. Strikingly, most applied invariance studies follow guidelines based on few ten-years-old simulation studies of the two-group setting. The assumption that those guidelines are applicable to much larger, heterogeneous, and complex samples, typical for modern international surveys, is unrealistic. One negative consequence of using the old-fashioned and inappropriate guidelines for determining whether a particular MGCFA model satisfies invariance requirements or not is that often researchers find that popular sociological constructs lack cross-cultural comparability. Using Monte Carlo simulation experiments, this project examines how well popular SEM goodness-of-fit measures, such as CFI, TLI, RMSEA,and SRMR, perform in the context of measurement invariance testing in large samples. Its contribution to the existing methodological literature on cross-national survey research is three-fold. First, it explores how sensitive are the aforementioned fit measures to various amounts of measurement non-invariance in large samples (10-30-50 groups) under various conditions imitating typical features of such type of survey data. Second, it tests how other model misspecifications affect model fit in the multigroup setting, thus disentangling the impact of different fit-worsening factors. The results suggest that CFI and SRMR are superior to RMSEA and TLI as measures of model misfit due to non-invariance, but the existing cut-off values for all these measures are too strict and must be somewhat softened. Finally, it examines how critical are different levels of non-invariance in terms of bias in the latent means hierarchy. The results show that the danger of measurement invariance might be somewhat exaggerated since even in conditions with the highest levels of metric and scalar non-invariance the estimated latent means do not deviate strongly from the true population values.
Professor Nick Allum (University of Essex) - Presenting Author
Ms Kirby King (Government Statistical Service)
Dr Paul Stoneman (Goldsmiths College)
Background: The General Health Questionnaire (GHQ) is a widely used instrument for identifying minor psychiatric disorders in the general population. Notwithstanding its widespread use in social and epidemiological research, little is known about its validity as a comparative tool for measuring the mental health of adults from different ethnic groups. Our objective in this paper is to assess the GHQ’s suitability for this task by testing for measurement invariance with respect to five ethnic minority groups in the UK: Indians, Pakistanis, Bangladeshis, Caribbeans, Africans, along with the white British majority. We investigate the extent to which the short-form version of the instrument – the GHQ-12 - exhibits configural, metric and scalar invariance across six ethnic groups using the UK Household Longitudinal Study (N= 35,437).
We evaluate alternative factor structures for the GHQ that have been suggested in previous literature and show that a unidimensional structure with correlated errors for reverse-valenced items provides the best fit in all subgroups. We submit this model to tests for metric and scalar invariance across groups and find substantial equivalence in the measurement properties of the scale across all groups. We complement this with tests of association with criterion variables for both latent and summated scale versions of the instrument and find little difference.
Conclusions: We find that policy makers and scholars should not be overly concerned with the cultural sensitivity of the GHQ-12 and that valid comparisons across different ethnic groups can be made using the instrument in adult populations.