ESRA logo

Tuesday 16th July       Wednesday 17th July       Thursday 18th July       Friday 19th July      

Download the conference book

Download the program





Tuesday 16th July 2013, 11:00 - 12:30, Room: No. 18

Estimation and Imputation Under Informative Sampling and Nonresponse 1

Convenor Professor Danny Pfeffermann (University of Southampton)

Session Details

Survey data are frequently used for analytic inference on statistical models, which are assumed to hold for the population from which the sample is taken. Familiar examples include the analysis of labour market dynamics from labour force surveys, comparisons of pupils' achievements from educational surveys and the search for causal relationships between risk factors and disease prevalence from health surveys.The sample selection probabilities in at least some stages of the sample selection are often unequal; when these probabilities are related to the model outcome variable, the sampling process becomes informative and the model holding for the sample is then different from the target population model. Another related problem is unit noresponse which again may distort the population model if the response propensity is associated with the outcome of interest, known as not missing at random (NMAR) nonresponse.

Accounting for informative sampling is relatively simple, because the sample selection probabilities are usually known, and several approaches have been proposed in the literature to deal with this problem. On the other hand, accounting for NMAR nonresponse is much harder because the response probabilities are unknown, requiring to assume some structure on the response mechanism, or in the case of longitudinal surveys with attrition, to model the response probabilities as functions of previously measured values.

The aim of this session is to present different practical scenarios giving rise to informative sampling and/or NMAR nonresponse and to discuss alternative approaches to deal with these problems. The focus of the different presentations will be on imputations of missing sample data under NMAR nonresponse and estimation of model parameters and/or finite population means under both informative sampling and NMAR nonresponse.


Paper Details

1. Dealing with non-ignorable nonresponse in the German job vacancy survey

Dr Hans Kiesl (Regensburg University of Applied Sciences)

The German job vacancy survey is a quarterly business survey (stratified by sector and size class). Calibration is used to deal with high nonresponse rates; the most important calibration variable being the number of registered job vacancies (business units may inform the Federal Employment Agency of their vacancies; these are called registered vacancies). It turns out that design weighting (ignoring nonresponse) results in biased estimates of the total number of registered vacancies. Thus, nonresponse is correlated with the number of registered vacancies, so that calibration seems to be a promising weighting strategy. However, there is reason to believe that nonresponse is actually correlated with the number of any vacancies (registered or not), which means that nonresponse in our survey is non-ignorable.

In this paper, we compare two estimation strategies to deal with non-ignorable nonresponse in our setting in terms of bias and variance by means of a simulation study: the generalized calibration estimator proposed by Deville (2002) and Chang and Kott (2008), and a two-stage GREG estimator.


2. Does balancing survey response reduce nonresponse bias?

Dr Barry Schouten (Statistics Netherlands, Methodology Department)
Fannie Cobben (Statistics Netherlands, Methodology Department)
Peter Lundquist (Statistics Sweden, Methodology Department)

Recently, various indicators have been proposed as indirect measures of nonresponse error in surveys. The indicators employ available auxiliary variables in order to detect nonrepresentative or unbalanced response. The indicators may be used as quality objective functions in responsive and adaptive survey designs. In such designs different population subgroups receive different treatments. The population subgroups are formed using frame/registry data and paradata observations. The designs optimize allocation of resources under constraints on the precision of key statistics. The rationale behind the use of the indicators as quality objective functions is the conjecture that a stronger deviation from MCAR(X) implies a stronger deviation of MAR(Y|X). Hence, the indicators are viewed as process quality indicators.

The natural question is whether the decrease in nonresponse bias caused by adaptive and responsive designs could also be achieved by nonresponse adjustment methods that employ the same auxiliary variables. In other words, does balancing survey response during data collection lead to smaller nonresponse bias, even after nonresponse adjustment.
In this paper, we discuss this important question. We provide theoretical and empirical considerations on the role of both the survey design and nonresponse adjustment methods to make response representative or balanced. The empirical considerations are supported by a wide range of household and business surveys from The Netherlands, Sweden and the USA.


3. Testing and adjusting for informativeness in analytic inference

Professor Jean Opsomer (Colorado State University)
Professor Jay Breidt (Colorado State University)
Mr Wade Herndeon (Colorado State University)

Survey data are often collected through complex surveys with stratification, clustering, and unequal probabilities. Weighting is used to account for these design complexities, and also incorporates additional adjustments for nonresponse and for calibration to known population information. Researchers working with complex survey data may be tempted to ignore these sampling and weighting aspects when conducting statistical analyses and fitting models, but traditional methods often are not appropriate unless these aspects are accounted for.

We describe a likelihood-based testing approach to detect informativeness that is simple to implement and has good statistical properties. We obtain the asymptotic distribution of the test, and use both the asymptotic distribution and a more convenient bootstrap distribution in implementing the procedure. Because the approach relies on a comparison of the weighted and unweighted estimators, it is applicable for secondary analysts who have access to previously created weights. Following testing, we discuss several possible approaches for incorporating the informativeness in the analytic inference.



4. Accounting for Informative Sampling and NMAR Nonresponse when Fitting Models to Survey Data

Professor Danny Pfeffermann (Southampton Statistical Sciences Research Institute)
Moshe Feder

When sample data are the result of informative sampling or not missing at random (NMAR) nonresponse, the model holding for the observed data can be very different from the target model holding in the population from which the sample is taken, because of selection bias. Accounting for informative sampling is relatively simple, since the sample selection probabilities are usually known, and several approaches have been proposed in the literature to deal with this problem. Accounting for NMAR nonresponse is much more difficult since the response probabilities are generally unknown, requiring to assume some structure on the response mechanism.

In this presentation I shall discuss a new approach for modelling complex survey data, which accounts simultaneously for informative sampling and NMAR nonresponse. The approach uses the empirical likelihood, which is defined with respect to the distribution of the data observed for the responding units. Estimation of the unknown model parameters and response probabilities is enhanced by a set of calibration constraints that are related to the model outcome variable and the covariates included in the model for the response probabilities. Application of this approach allows estimating the population model parameters and impute the missing data using the model holding for them, which is different from the population model. Simulation results illustrate good performance of the proposed approach in terms of parameter estimation and imputation.