Advanced Statistical Methods for Survey Research |
|
Session Organiser | Ms Zsofia S. Ignacz (research associate) |
Time | Tuesday 16th July, 16:00 - 17:00 |
Room | D30 |
This session includes papers that showcase advanced statistical methods for survey research.
Keywords: multivariate methods, latent structure models
Mrs Jiayun Jin (KU Leuven) - Presenting Author
Professor Geert Loosveldt (KU Leuven)
As the most widely-used tools in statistical process control (SPC), control charts have been designed and mostly used separately for numerical variables and categorical variables. In the face of a massive amount of correlated multiple variables collected from modern processes, however, few researchers have addressed the problem of using control charts to monitor the numerical and categorical indicators simultaneously.
In the present paper, we explore the possibilities of monitoring mixed types of survey data quality indicators simultaneously by control charts. We first propose a new use of “Principal Component Analysis (PCA) mix” procedure (Marie et al. 2017), which is dedicated to jointly deal with numerical and categorical variables, in survey settings to transform the mixed quality indicators into principal components. In order to resolve the non-multivariate normal nature of the obtained principal component scores of PCAmix, we apply a non-parametric bootstrap method to calculate the control limit and define an outlier. Furthermore, we develop an iterative procedure to determine the in-control data: it removes outliers one at a time, and re-estimate the PCAmix model and the bootstrap-based multivariate control chart with the outlier dropped, until no outlier is identified. The developed procedure is ultimately applied to the Belgium data of the eighth round of the European Social Survey, which uses face-to-face interviews, in order to evaluate survey quality by monitoring six numerical and two categorical quality indicators and obtain a set of in-control data.
Our study provides a new approach to simultaneously monitor a mixture of numerical and categorical quality indicators without sacrifice of precision, and results in a set of in-control data, which can be used as the reference data for the real time monitoring of similar data in the future.
Ms Zsofia S. Ignacz (research associate) - Presenting Author
Mr Clemenns Lechner (GESIS Leibniz Institute for the Social Sciences)
Latent structure modelling enables researchers to grasp social factors and social phenomena, which are not directly measurable with survey items. While the most widespread type of latent structuring modelling is factor analysis, often it is not feasible theoretically to treat the latent construct of interest as continuous. Here, latent class analysis (LCA) offers useful solutions. Most studies applying LCA are restricted to single-country analyses and they does not differentiate between groups within the sample. However, with the increasing number of cross-national survey available, the application of multi-group LCA has become important. Unfortunately, unlike in factor analysis, where there is a substantial body of literature on measurement invariance, the papers employing multi-group LCA do not rely on clear and transparent strategy that show how they arrived to given model solution and which benchmarks were employed. Thus, our paper aims to fill this gap by marking a clear analytical strategy for researchers to follow easily. We propose that multi-group LCA model development needs to rely on a multi-faceted approach to arrive to an optimal multi-group LCA. We argue, it is essential for the strategy to incorporate both single LCA (per group) and multi-group LCAs. Additionally, theoretical considerations related to the substantive interpretation of latent classes is also important cornerstone in the analytical strategy.
The application of multi-group latent class analysis for the presentation stems from the substantive research on religious orientation. For our primer on multi-group LCA we show aim to identify different types of religious orientations and hence identify religious believers, non-believers, and atheists. The analysis draws on a unique dataset from 12 countries surveyed in the Bertelsmann Religion Monitor 2012 (N = 13,037). We rely on a complex multi-dimensional item selection to identify the different religious types with the application of latent class analysis across countries.
Miss Irina Zangieva (National Research Uninersity Higher School Of Economics)
Miss Anna Suleymanova (National Research University Higher School of Economics) - Presenting Author
Despite long and widespread application of the factor analysis and substantial number of factor extraction methods, it is a common practice in social sciences to use “factor analysis” and “principal components analysis” interchangeably. Most of the evaluated studies failed even to specify, which factor extraction method was used, or rely solely on PCA with or without argument.
Therefore, the research problem of the study was the lack of structured and comprehensive guidance on factor extraction method selection in published research. The aim was to develop theoretically and empirically supported algorithm of adequate factor extraction method selection depending on a combination of such aspects as sample size, number of indicators specifying each factor, size and range of communalities, presence of model error, and the distribution of indicators. Seven factor extraction methods were studied (principal component analysis (PCA), unweighted and generalized least squares method (ULS and GLS), maximum likelihood method (ML), principal axis analysis (PAX), alpha-factor analysis (AF), and image factoring (IF)).
Theoretically substantiated algorithm was tested by means of statistical experiment with Monte Carlo simulation and further specified to four main recommendations: (1) if model error is suspected, it is recommended to use PAX or AF, (2) if sample is large enough and indicators are normally distributed, or vice versa, if sample is not large enough and distribution of indicators is different from normal, it is recommended to use ML or GLS, (3) if sample is large enough, but indicators are not normally distributed or if indicators are normally distributed, but sample size is not large enough and communalities are less than 0.6, it is recommended to use ML exclusively, (4) if indicators are normally distributed and communalities are larger than 0.6, but sample size is not large enough, it is recommended to use GLS.