All time references are in CEST
Use of Machine Learning in Questionnaire Development and Evaluation |
|
Session Organiser | Professor Natalja Menold (Dresden University of Technology) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
Development and validation of measurement instruments can pose challenges of item selection or instrument modifications, which should meet different quality criteria, such as minimizing systematic measurement error and maximizing reliability, validity and comparability among population groups. Machine learning methods have become relevant to solve such problems and can foster innovations in the area of questionnaire development and evaluation. Moreover, machine learning methods can help to investigate numerous potential sources of bias in the measurement and can help to improve questionnaire design with the aim of maximizing measurement quality and data comparability.
The session aims to foster discussions among the researchers who use machine learning as a tool of optimizing questionnaire design process, evaluation of systematic and non-systematic bias in measurement, as well as for validation purposes. Besides such application examples, papers are invited that provide conceptual basis for the use of machine learning in measurement, compare machine learning methods with previously established methods, or discuss limitations and further developments.
Keywords: Machine Learning, Testing, Measurement, Questionnare Design
Mr Halil Duran (gesis - Leibniz Institute for the Social Sciences) - Presenting Author
According to Townsend (1979: 31) "Deprivation refers to the inability to afford a range of items and activities that are widely viewed as key conditions for participation in the society to which one belongs" (Lanau 2023: 335). In 2012, EU Member States adopted a 13-item scale to measure the material and social deprivation for the whole population on an annual basis (Guio et al. 2012: 9; 111). The goal is to measure deprivation rates for all countries participating in the EU-SILC. Therefore a household is labeled deprived, if there are at least 3 items missing out of 13.
One key challenge regarding the measurement of deprivation is the selection of a suitable number and composition of items especially in relation to respondent burden. Previous research has shown how measuring deprivation rates could be implemented more efficiently and with minimal information loss using adaptive testing, at least in the context of a single country (Bailey & Guio 2022; Bailey 2020). Though the issue with adaptive testing is twofold: First of all the implementation is only possible in CAPI or in CAWI. Secondly the comparison of items between households get lost, because not all the households get the same items, but items based on their level of deprivation. This paper in contrast uses a different approach, namely classification trees (Breiman et al. 1984), using a data-driven analysis. The descriptive goal is to find out, how many items will be needed to capture similar deprivation rates as with the full set of deprivation items. Therefore I use data of the german Panel Survey of Social Security (PASS) from 2006/07 to 2018 with the core set of 21 items. Additionally I will show results from EU-SILC to compare between different countries.
Mr Marco Fölsch (Dresden University of Technology) - Presenting Author
Professor Natalja Menold (Dresden University of Technology)
Measurement invariance is essential for measuring constructs such as values and attitudes consistently across different respondent groups and contexts. To interpret the effects of survey items in a comparative and meaningful way, social science researchers must ensure that the underlying concepts of the items are understood similarly across cultures, demographics, and time periods. Especially with datasets growing in size and complexity, covering longer time periods and more and more diverse contexts, traditional approaches, such as confirmatory factor analysis, face limitations. In our study, we instead leverage machine learning techniques, such as random forests, to identify sources of non-invariance, including differential item functioning. This allows us a more nuanced evaluation of how items and sets of items perform across diverse groups, different contexts, and time periods. We establish patterns of varying levels of invariance in items and constructs and provide explanations for these differences. Our results offer valuable insight for the improvement of questionnaire designs, such as refining item wording or controlling for specific cultural settings. Overall, our study contributes to the growing discussion of integrating machine learning into measurement research and presents a novel framework for ensuring the reliability and validity of survey measures across contexts.
Dr Daniel Seddig (Criminological Research Institute of Lower SaxonyCriminological Research Institute of Lower Saxony) - Presenting Author
Dr Heinz Leitgöb (Leipzig University)
Dr Ilka Kammigan (Helmut Schmidt University)
Professor Dirk Enzmann (University of Hamburg)
Mr Franz Classe (University of Munich)
This study introduces an integrated approach to exploring cross-national measurement invariance, combining confirmatory factor analysis and model-based recursive partitioning. The method examines global measurement invariance across countries and identifies noninvariant items, exploring group-based noninvariance patterns using partitioning variables. These variables help explain the absence of measurement invariance and contribute to theorizing cross-national applicability. The approach is applied to the self-control scale (Grasmick et al., 1993) using data from the Third International Self-Report Delinquency Study (ISRD-3). Results reveal systematic differences in how students respond to the self-control scale across countries, influenced by cultural, educational, and economic factors. Higher economic development corresponds with less variation in responses, while less developed countries exhibit more extreme patterns. Moreover, cultural values like egalitarianism, along with education systems, shape students’ item responses. The results provide insights into the causes of comparability issues, offering important implications for the theoretical validity of concepts that claim universal applicability.