Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Effect of nonresponse on results of statistical models 2 |
|
Convenor | Professor Christof Wolf (GESIS) |
Coordinator 1 | Professor Dominique Joye (University Lausanne/FORS) |
The research on nonresponse has made very important progress these last years by looking to many different aspect of the survey process linked to the nonresponse question and on the way to estimate a potential bias on some indicators. However, research on the effects of nonresponse has mostly focused on possible bias of point estimates and how to correct for this bias. Very little research has been done on the consequences nonresponse can have on estimates of covariance structures and multivariate models; data structures that are much more common in substantive social science research. The aim of this session is to explore how nonresponse can effect estimates of covariance and effect sizes and ways to counteract these effects. All papers -- theoretical contributions, empirical analysis, results from simulations or experimental studies -- are welcome.
Item non-response is a frequently occurring phenomenon in social surveys. Typically, multivariate data should be augmented by plausible values using a joint modeling imputation. The aim of such an approach is to obtain a correct statistical inference for current analyses that are done in social science research. A possible compensation of the missing data is when using imputation methods which rely on the assumption of jointly normally distributed data. However, poverty indicators or income variables follow the normal distribution only in the rarest of cases. Therefore, we study the handling of skewed data by different transformation approaches before imputation.
The transformation we present urges the need for completely observed data. However, missing data is the origin of the problem and a transformation of the available data can cause an inherent bias in subsequent analyses. Therefore it will be necessary to augment the data before transformation. This implicates a vicious circle. The mutual dependence suggests an iterative approach that improves the results of both, the imputation and the transformation, until an appropriate termination criteria is achieved.
The recent paper presents an extension of the Expectation-Maximization transformation algorithm of Huergo (2010). In a comparative evaluation we analyze the properties of the procedure using current estimates in social sciences such as regression or total estimates. In contrast to standard transformations, the algorithm benefits from the multivariate structure of a data set. Hence, interactions and relations between variables can be maintained in order to obtain plausible and proper results.
The 2011 version of the Swiss Election Study was the first to use the new harmonized individual register. Compared to the telephone register, this frame offers better coverage, and information on basic socio-demography of sample members. This can be used to study nonresponse in more detail. In addition to nonresponse, for telephone surveys, undercoverage is an issue because not all individuals can be matched to telephone number.
We estimate predicted probabilities from multivariate logistic models using variables from the sampling frame resulting from different samples according to matching effort and response: starting with the "true" values (gross sample), we estimate coefficients in different coverage scenarios, and finally from respondents. We find the selected variables from the frame to have different sensitivities with respect to coverage and nonresponse. For instance while it is easier to match telephone numbers for both married and older people, married people have a higher likelihood to respond and older people not to respond. Results show that while nonresponse bias goes in the same direction as coverage bias in the case of modelling being married, in the case of modelling belonging to the elderly, the picture is mixed.
In a second step we study effects of different telephone number matching efforts on the relationships of respondent's substantive variables like voting turnout or political interest. Based on known biasing effects on these variables (higher values due to selection effects), we analyze whose relationships can be improved by expanding the sample by including additionally matched sample members.
Raking weighting adjustment with auxiliary variables is currently utilized to reduce unit nonresponse bias. However, raking has some difficulties as the algorithm may not converge if some of the cell observations are zero(Lohr, 2010; Rao & Wu, 2009). Hidiroglou and Patak(2006) also indicated that a sufficient number of observations (at least half) must be available for raking weight adjustment at the stratum level. The study proposes the use of learning vector quantization networks (LVQ) to correct the sampling weights where one of the cell observations is zero. A numerical simulation study is conducted to examine the accuracy of confirmatory factor analysis (CFA) parameters under LVQ and raking weighting adjustment methods. In study 1, all of the cell observations are positive. In contrast, one of the cell observations is zero in study 2. Other experimental factors, including missing proportions, sampling sizes, and heterogeneity of groups, are designed to examine performances of LVQ and raking weighting adjustment. Results of study1 showed that LVQ and raking weight adjustment yield a high rate of accuracy. Results of study 2 appeared that accuracies and stabilities for estimates of CFA parameters of LVQ are much better than the raking method as one of the cell observations is zero. If one of the cell observations is zero, LVQ weighting adjustment can reduce nonresponse bias for estimates of CFA parameter. In contrast, raking weighting adjustment tends to display much bias.
The sociological explanation for the rise in nonresponse rates attributes the main cause to changes in the cultural attitudes of modern people to surveys. Such changes are not identical in different social groups and nonresponse rates distort the differences in the social structure of the sample. The estimation of such distortions is usually possible using a small set of indicators and comparing the social structure of the sample to the social structure of the target population. So this method cannot guarantee that the sampling distribution will be error free based on latent indicators of the social structure, which can have major consequences for dependent variables.
In our paper, we use the simplified version of structural equation approach to estimate the impact of nonresponse on the sample representativeness. For example, for the study of shadow wages by a representative survey of the population we created a theoretical equation with the following variables: six independent, one dependent (similar analogue in published official statistics) and one latent - "the volume of shadow wages in the region». Four of the independent variables are answers to survey questions, two of which are sensitive questions with high item-nonresponse rate. The dependent variable is a function computed by multiplying the matrices of two-dimensional distributions and then subtracting the values of two independent variables. We argue that the deviation of the calculated value of the dependent variable from the analogous value in statistic is the external validity estimate of the representativeness of distributions found in the survey.