Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Estimation and Imputation Under Informative Sampling and Nonresponse 1 |
|
Convenor | Professor Danny Pfeffermann (University of Southampton) |
Survey data are frequently used for analytic inference on statistical models, which are assumed to hold for the population from which the sample is taken. Familiar examples include the analysis of labour market dynamics from labour force surveys, comparisons of pupils' achievements from educational surveys and the search for causal relationships between risk factors and disease prevalence from health surveys.The sample selection probabilities in at least some stages of the sample selection are often unequal; when these probabilities are related to the model outcome variable, the sampling process becomes informative and the model holding for the sample is then different from the target population model. Another related problem is unit noresponse which again may distort the population model if the response propensity is associated with the outcome of interest, known as not missing at random (NMAR) nonresponse.
Accounting for informative sampling is relatively simple, because the sample selection probabilities are usually known, and several approaches have been proposed in the literature to deal with this problem. On the other hand, accounting for NMAR nonresponse is much harder because the response probabilities are unknown, requiring to assume some structure on the response mechanism, or in the case of longitudinal surveys with attrition, to model the response probabilities as functions of previously measured values.
The aim of this session is to present different practical scenarios giving rise to informative sampling and/or NMAR nonresponse and to discuss alternative approaches to deal with these problems. The focus of the different presentations will be on imputations of missing sample data under NMAR nonresponse and estimation of model parameters and/or finite population means under both informative sampling and NMAR nonresponse.
The German job vacancy survey is a quarterly business survey (stratified by sector and size class). Calibration is used to deal with high nonresponse rates; the most important calibration variable being the number of registered job vacancies (business units may inform the Federal Employment Agency of their vacancies; these are called registered vacancies). It turns out that design weighting (ignoring nonresponse) results in biased estimates of the total number of registered vacancies. Thus, nonresponse is correlated with the number of registered vacancies, so that calibration seems to be a promising weighting strategy. However, there is reason to believe that nonresponse is actually correlated with the number of any vacancies (registered or not), which means that nonresponse in our survey is non-ignorable.
In this paper, we compare two estimation strategies to deal with non-ignorable nonresponse in our setting in terms of bias and variance by means of a simulation study: the generalized calibration estimator proposed by Deville (2002) and Chang and Kott (2008), and a two-stage GREG estimator.
Recently, various indicators have been proposed as indirect measures of nonresponse error in surveys. The indicators employ available auxiliary variables in order to detect nonrepresentative or unbalanced response. The indicators may be used as quality objective functions in responsive and adaptive survey designs. In such designs different population subgroups receive different treatments. The population subgroups are formed using frame/registry data and paradata observations. The designs optimize allocation of resources under constraints on the precision of key statistics. The rationale behind the use of the indicators as quality objective functions is the conjecture that a stronger deviation from MCAR(X) implies a stronger deviation of MAR(Y|X). Hence, the indicators are viewed as process quality indicators.
The natural question is whether the decrease in nonresponse bias caused by adaptive and responsive designs could also be achieved by nonresponse adjustment methods that employ the same auxiliary variables. In other words, does balancing survey response during data collection lead to smaller nonresponse bias, even after nonresponse adjustment.
In this paper, we discuss this important question. We provide theoretical and empirical considerations on the role of both the survey design and nonresponse adjustment methods to make response representative or balanced. The empirical considerations are supported by a wide range of household and business surveys from The Netherlands, Sweden and the USA.
Survey data are often collected through complex surveys with stratification, clustering, and unequal probabilities. Weighting is used to account for these design complexities, and also incorporates additional adjustments for nonresponse and for calibration to known population information. Researchers working with complex survey data may be tempted to ignore these sampling and weighting aspects when conducting statistical analyses and fitting models, but traditional methods often are not appropriate unless these aspects are accounted for.
We describe a likelihood-based testing approach to detect informativeness that is simple to implement and has good statistical properties. We obtain the asymptotic distribution of the test, and use both the asymptotic distribution and a more convenient bootstrap distribution in implementing the procedure. Because the approach relies on a comparison of the weighted and unweighted estimators, it is applicable for secondary analysts who have access to previously created weights. Following testing, we discuss several possible approaches for incorporating the informativeness in the analytic inference.
When sample data are the result of informative sampling or not missing at random (NMAR) nonresponse, the model holding for the observed data can be very different from the target model holding in the population from which the sample is taken, because of selection bias. Accounting for informative sampling is relatively simple, since the sample selection probabilities are usually known, and several approaches have been proposed in the literature to deal with this problem. Accounting for NMAR nonresponse is much more difficult since the response probabilities are generally unknown, requiring to assume some structure on the response mechanism.
In this presentation I shall discuss a new approach for modelling complex survey data, which accounts simultaneously for informative sampling and NMAR nonresponse. The approach uses the empirical likelihood, which is defined with respect to the distribution of the data observed for the responding units. Estimation of the unknown model parameters and response probabilities is enhanced by a set of calibration constraints that are related to the model outcome variable and the covariates included in the model for the response probabilities. Application of this approach allows estimating the population model parameters and impute the missing data using the model holding for them, which is different from the population model. Simulation results illustrate good performance of the proposed approach in terms of parameter estimation and imputation.