Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Methodological advances in Latent Class Models for Surveys |
|
Convenor | Dr Daniel Oberski (Tilburg University) |
Coordinator 1 | Dr Milos Kankaras (Tilburg University) |
Latent class modeling (LCM) is a very general technique that encompasses many different statistical models as special cases; the common thread is that of latent variable models where the latent variable is considered as discrete. Examples of special cases include latent structure analysis, latent Markov models, mixture models, model-based clustering, diagnostic test evaluation, nonparametric IRT, and latent class factor models.
When applied to surveys, latent class models may be useful as a way of relaxing the sometimes stringent assumptions in survey error analysis. For example, the assumptions of the linear factor model for the estimation of scale reliability, or of monotone systematic errors in cross-country invariance models may be relaxed.
At the same time, survey errors pose complications in latent class models that may be simplified or may not arise at all in other types of models. For example, the classical result that measurement error in the dependent variable does not affect regression coefficients does not hold for discrete latent variables.
LCM therefore offers both opportunities and challenges for survey researchers, which leads this session to invite presentations on the following topics:
- Latent class modeling for the estimation or evaluation of survey errors;
- The effect of survey errors on latent class modeling ;
- Methods of coping with survey error in latent class models.
Survey errors could include, but are not limited to: Measurement error, nonresponse error; sampling error, cross-country or group- comparability.
We would be particularly interested in methodological innovations on these topics, and in substantive applications demonstrating such innovations.
Latent class models can be used to study attrition patterns in panel surveys. The advantage of using a Latent Class Model over other models, is that there are fewer assumptions with regards to the pattern of attrition in the survey. People in a panel survey may drop out and never return (monotone attrition), but any other (non-monotone) attrition pattern is possible as well. Using a Latent Class model, respondents can be grouped into homogeneous classes that each follow a different attrition pattern.
The latent class variable can subsequently be used in several ways. Differences on substantive variables can be investigated when the latent class variable is used to predict one or more dependent variables for example. Another possibility of interest to survey methodologists, is the study of measurement errors within each attrition class. In this way, it is possible to estimate measurement errors for every class of attriters, and ultimately, to study whether there is any relation between attrition error and measurement error. For example, is it true that who drop out quickly report with more measurement error than respondents who participate in every wave of a panel survey?
This presentation outlines different statistical methods to investigate attrition patterns, determine the number of Latent Classes, develop Latent Class indicators for attrition, and relate these to models that estimate measurement errors in panel surveys.
This paper discusses the analysis of multiple Likert items in surveys, where individuals are required to respond to a number of items measured on the same underlying scale.
The relative importance of items is often more informative than the absolute rating, as there is evidence that different parts of society or countries interpret Likert scales in different ways (Heine et al, 2002) , leading to survey error in group comparability.
We consider an approach to such data which can be used when the interest is in the changing emph{relative} importance of the items in a complex covariate model. The method makes fewer assumptions about the distribution of the responses than the more usual approaches such as comparisons of means, MANOVA or ordinal data methods. The method involves modelling the multiple Likert items through a set of generated paired comparisons. The model can be formulated as a Poisson log-linear model, providing standard likelihood based inference for model selection. The effect of covariates on the relative ordering of items can therefore assessed and estimated straightforwardly (Dittrich et al., 2007) . The model is also suitable for ranked items.
A recent development has been the incorporation of latent classes into the relative Likert model described above. We describe the advantages of our approach compared with the more traditional latent class analysis of absolute scores.
The method will be illustrated on a collection of Likert items taken from the British Household Panel Survey.
HIV prevalence in China is currently less than one percent, but due to the large population this translates into a large number of people. The number of people living with HIV is growing and moving beyond high-risk groups to the general population. Ensuring adequate knowledge is important for the successful HIV prevention as adequate knowledge is an important component of a risk reduction behaviour framework. HIV knowledge cannot be observed or measured directly but can be measured through the observed components of HIV knowledge. There are two main approaches to measuring HIV knowledge employed in literature: simple score approach and latent class variable approach. This paper aims to study the evolution of HIV knowledge in China using both approaches. It first compares the levels of HIV knowledge in China over time to study the improvements of the levels of knowledge among different groups of women and then it compares the levels of knowledge in China with levels of knowledge in India, Kenya, Malawi and Ukraine in order to place China into the context with other countries in the world. The following data sources are used for the analysis: the China National Population and Reproductive Health Survey 1997, the China National Family Planning and Reproductive Health Survey 2001, the UNFPA Reproductive Health and Family Planning Survey 2005, India DHS 2006, Kenya DHS 2003, Malawi DHS 2004 and Ukraine DHS 2007. This paper compares two methodological approaches to measuring HIV knowledge and discusses unique insights into the topic each approach provides.
IRT models allow for an aggregation of individual responses towards competence scores. To enhance estimation of individual ability parameters with latent relationships between competence scores and individual characteristics, background variables capturing these individual characteristics are explicitly incorporated into corresponding IRT models. The obtained individual ability estimates are often provided via the concept of plausible values summarizing estimated individual ability parameters easy accessible for secondary analysis. Despite tremendous efforts in field work missing values in the background variables enhancing for instance the plausible values with possible latent dependencies can occur. Hence adequate estimation routines are needed to reflect the uncertainty stemming from missing values in the background variables within the estimation of plausible values. To achieve this, we propose an estimation strategy based on Markov Chain Monte Carlo (MCMC) techniques addressing missing values in background variables and estimation of plausible values simultaneously. The uncertainty stemming from the missing values are incorporated within the parameter estimation using the device of data augmentation. Our approach is not restricted to parametric assumptions concerning the distribution of missing values. Instead the distribution of missing values is established on the basis of non parametric sequential regression trees (CART), which results in a hybrid MCMC sampling scheme.In several simulation setups allowing for control of the mechanism causing missing values, we evaluate the validity of our approach with respect to statistical accuracy. The results show the importance to address missing values in background variables when considering latent relationships. Empirical application using the NEPS starting cohort of fifth graders reveals typical associations between competence measures and background variables as suggested by theory.