All time references are in CEST
The potential of survey and questionnaire design to achieve measurement invariance 1 |
|
Session Organisers |
Dr Katharina Meitinger (Utrecht University) Professor Natalja Menold (TU Dresden) Dr Heinz Leitgöb (Leipzig University) |
Time | Wednesday 19 July, 14:00 - 15:00 |
Room | U6-02 |
A common finding in measurement invariance testing is that the property of metric or scalar measurement invariance is difficult to achieve in cross-cultural survey data. Whereas approximate approaches of measurement invariance testing received great interest, the impact of survey methodological decisions on the results of measurement invariance analysis have been relatively underemphasized. However, previous research revealed the serious impact of various survey methodological aspects on measurement invariance, such as differences in question wording, translations, rating scale forms, visual presentation, modes, or devices. At the same time, survey methodology also provides us with a toolkit to improve the measurement invariance of survey questions. Optimal translation procedures (e.g., TRAPD approach) or approaches at the development and pretesting stage (e.g., focus groups, expert reviews, cross-cultural cognitive interviewing, web probing) can potentially improve the comparability of survey items. Some of these approaches could also be implemented during or after the actual data collection (e.g., web probing). Careful conceptualization and operationalization can help to improve the factorial structure of indicators and therefore reveal more promising measurement invariance results. Anchoring vignettes or similar approaches to control for differential item functioning could help to adjust data and to improve their comparability, which should also improve the results of measurement invariance analysis.
This session wants to provide a platform for survey methodological evidence to improve measurement comparability. The aim is to foster a discussion on survey methodological approaches to improve data comparability evaluated by measurement invariance analysis before, during, or after the data has been collected.
Keywords: Measurement Invariance, Survey Methodology, Comparability
Professor Beatrice Rammstedt (GESIS Leibniz Institute for the Social Sciences) - Presenting Author
Dr Lena Roemer (GESIS Leibniz Institute for the Social Sciences)
Professor Daniel Danner ( University of Applied Labour Studies, Mannheim)
Dr Clemens M. Lechner (GESIS Leibniz Institute for the Social Sciences)
Generally accepted rules for item formulations recommend to keep item wordings always as simple as possible and to avoid double-barreled questions. However, the empirical basis for these claims is scarce. The aim of the present study was to systematically investigate in an experimental design if a simplification of personality items and an avoidance of double-barreled items markedly increases the psychometric quality of the scale. We compared the original item formulations of the BFI-2 with simplified item versions and tested both versions based on a large, heterogeneous sample. In none of the analyses the simplified version possessed better psychometric quality. In contrast, it showed weaker factorial validity compared to the original item formulations. These findings also replicate for lower-educated which can be regarded as more sensitive to complex item formulations. Our study thus indicates that item simplifications and the avoidance of double-barreled items does not improve the quality of questionnaires.
Professor Natalja Menold (Technische Universitaet Dresden) - Presenting Author
We evaluated cognitive pretests as method to improve measurement invariance between refugee and host populations. Limited data comparability may result in the Othering of respective minority populations. The study focused on the measures relevant to health and participation in democratic societies. We adapted measurement instruments for the physical and mental health, loneliness, quality of life, discrimination and attitudes towards democracy to Arabic and Dari using TRAPD approach, conducted cognitive pretests of the adaptations, revised adaptations on the basis of the results of the cognitive pretests and collected new data with revised instruments. For a part of the data, we used probability individual register samples drawn for three Saxon Cities and implemented a push-to-web study by means of postal contact and re-contact. The other part used samples of refugees recruited via Facebook. Measurement instruments before and after cognitive pretests were randomly assigned to different language and refugee groups. On the basis of the first analyses in probability samples for Arabic and German languages we found that problems identified in the cognitive pretests mainly corresponded to statistical incomparability discovered by measurement invariance analysis. With the versions based on the cognitive pretests, scalar invariance could be established for the General Health Screener (SF-12), metric invariance could be improved for Loneliness and Refugee Health Screener (RHS-9) and established for the Quality of Life instrument. Therefore, some adapted versions that were based on the results of the cognitive pretests faced improved measurement invariance. For the other two cognitively tested and revised instruments (discrimination experiences and opinions towards the democracy), measurement invariance was not affected after revisions, however. The results point to some limitations of the cognitive pretests, i.e., notion for a more informed item selection or limited possibilities to discover problems that are difficult to verbalize.
Dr Ranjit K. Singh (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
There is little doubt that measurement invariance (MI) is crucial for comparative research. Yet, a large obstacle to a wider adoption of MI testing is that all conventional statistical procedures require multi-indicator instruments. However, I am convinced that we can explore at least some aspects of the MI of single-item instruments across countries by comparing different instruments for the same concept across different countries.
Specifically, I will make use of observed score equating in a random groups design (OSE-RG). Originally, OSE-RG is a method to transform scores measured with one instrument in such a way that they become numerically comparable to scores measured by another instrument for the same concept. Crucially, OSE-RG is a population based approach in that it uses data where both instruments were applied to the same population to harmonize them. However, in the literature there is much debate if an OSE-RG solution obtained in one population can be applied to measurement scores from another population. This desirable case is called population invariance.
My core argument is that this issue of population invariance of OSE-RG and the issue of (scalar) measurement invariance of those two instruments are two sides of the same mathematical coin. Based on this logic, we can apply OSE-RG to two instruments in one country and then assess if the resulting harmonization solution holds in other countries. If they do not, then group invariance is violated. This, in turn, also implies that at least one of the two measurement instruments also violates measurement invariance across the two countries. In my talk, I will present a mathematical intuition alongside an empirical proof-of-principle drawn from existing cross-national survey data. I will also sketch out some further use cases for this approach