ESRA logo

Tuesday 16th July       Wednesday 17th July       Thursday 18th July       Friday 19th July      

Download the conference book

Download the program





Tuesday 16th July 2013, 16:00 - 17:30, Room: No. 22

Estimation and Imputation Under Informative Sampling and Nonresponse 2

Convenor Professor Danny Pfeffermann (University of Southampton)

Session Details

Survey data are frequently used for analytic inference on statistical models, which are assumed to hold for the population from which the sample is taken. Familiar examples include the analysis of labour market dynamics from labour force surveys, comparisons of pupils' achievements from educational surveys and the search for causal relationships between risk factors and disease prevalence from health surveys.The sample selection probabilities in at least some stages of the sample selection are often unequal; when these probabilities are related to the model outcome variable, the sampling process becomes informative and the model holding for the sample is then different from the target population model. Another related problem is unit noresponse which again may distort the population model if the response propensity is associated with the outcome of interest, known as not missing at random (NMAR) nonresponse.

Accounting for informative sampling is relatively simple, because the sample selection probabilities are usually known, and several approaches have been proposed in the literature to deal with this problem. On the other hand, accounting for NMAR nonresponse is much harder because the response probabilities are unknown, requiring to assume some structure on the response mechanism, or in the case of longitudinal surveys with attrition, to model the response probabilities as functions of previously measured values.

The aim of this session is to present different practical scenarios giving rise to informative sampling and/or NMAR nonresponse and to discuss alternative approaches to deal with these problems. The focus of the different presentations will be on imputations of missing sample data under NMAR nonresponse and estimation of model parameters and/or finite population means under both informative sampling and NMAR nonresponse.


Paper Details

1. Multiple imputation for a large complex household survey data - the German Panel on Household Finances (PHF)

Mr Junyi Zhu (Deutsche Bundesbank)
Mr Martin Eisele (Deutsche Bundesbank)

In this paper, we present a case study of the challenges and solutions for the imputation of a large complex household survey data - the first wave of the German Panel on Household Finances (PHF). The detailed discussion covers on mainly two aspects: I. model specification when real-data anomalies (e.g. the spike of zeros on value variables) and complex missing pattern are present; II. various analysis of the item non-response rate combined with the questionnaire logical tree.



2. Variance estimation methods for EU-SILC under imputation

Mr Christian Bruch (University of Trier)
Professor Ralf Muennich (University of Trier)

A well-known household survey in the area of social exclusion and poverty is EU-SILC. This survey considers a variety of country-specific sampling designs which especially is a big challenge for accuracy measurement of certain statistics. Frequently, these accuracy measures are based on variance estimations, where their complexity increases considerably with the complexity of the sampling design.

This becomes probably more sophisticated when simultaneously item nonresponse is considered. Often, single imputation methods are used to compute the statistic of interest on the base of a sufficient sample size. In this context a simple application of standard variance estimation methods may lead to a serious underestimation of the true variance where the variation which arises due to the imputation process is not properly considered.

Frequently, in social sciences variance estimation is done by applying resampling methods. There are some proposals in the literature to apply these methods in case of nonresponse and imputation (e.g. Shao and Sitter, 1996 or Rao and Shao, 1992). However, it is important whether their application is appropriate for the designs of EU-SILC.

The aim of this paper is to investigate different variance estimation methods under imputation regarding the requirements of nonresponse and sampling designs. The advantages and disadvantages of the methods are pointed out while delivering best practice recommendations under various conditions. Therefore, a Monte-Carlo simulation study is carried out using a dataset based on EU-SILC with regards to different designs as well as response patterns and imputation methods.


3. Estimating the proportion of in-hospital infections under biased sampling and censoring

Dr Micha Mandel (Hebrew University of Jerusalem)

Cross-sectional designs are often used to monitor the proportion of infections and other post-surgical complications acquired in hospitals. However, conventional methods for estimating incidence proportions when applied to cross-sectional data may provide estimators which are highly biased, as cross-sectional designs tend to include a high proportion of patients with prolonged hospitalization. One common solution is to use sampling weights in the analysis that adjust for the sampling bias inherent in a cross-sectional design. However, to apply this approach, the sampling weights must be estimated first using auxiliary data. The current paper describes in detail a method to build weights for a national survey of post-surgical complications conducted in Israel. The weights are used to estimate the probability of surgical site infections following colon resection; the results of the weighted analysis are validated by comparing them to those obtained from a parallel study with a historically prospective design. Finally, we present a sensitivity analysis that studies the effect of replacing the actual sampling weights with estimates.


4. Two-phase sampling experiment for propensity score estimation in voluntary samples

Dr Jae-kwang Kim (Iowa State University)
Dr Sixia Chen

Voluntary sampling is a non-probability sampling design with unknown sample inclusion probabilities. When the sample inclusion probability depends on the study variables, propensity score adjustment using auxiliary information may lead to biased estimation.
In this paper, we propose a novel application of two-phase sampling to estimate the parameters in the propensity model. With this two-phase sampling experiment, we can estimate the parameters in a propensity score model consistently. Then the propensity score adjustment can be applied to the original voluntary sample to estimate the population parameters. The proposed method is applied to the 2012 Iowa Caucus surveys.