Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Sampling for cross-national surveys |
|
Convenor | Dr Matthias Ganninger (GESIS) |
Cross-national surveys, like the European Social Survey (ESS), are being more and more frequently used by data users for substantial analyses.
To assure that the quality of the estimates obtained in these analyses is as high as possible, careful definition of sample designs in participating countries is important. An essential challenge at the planning stage lies in the comparative nature of most multi-national surveys: Achieving samples which yield estimates of comparable precision and low bias at low costs is in most cases a challenging task.
In this session, recent advances in the field of survey sampling for cross-national surveys and their application in real-world social surveys like the ESS, SHARE, PIAAC and others will be presented and discussed. The session aims to cover both, new methods of sampling and estimation techniques. The aim is to bring together basic research in the field of sampling and estimation as well as success stories of their application in compariative sample survey projects.
The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.
Nationwide opinion surveys are conducted in former Soviet Union for more than 20 years. Results of these surveys are increasingly used by both the ruling regimes and their opposition for pressing their arguments and justifying political actions.
This paper looks at Russian and Ukrainian national surveys from reliability perspective on two specific issues: sample coverage and non-response. The author, an expert pollster from GfK with nearly 20 years of experience in conducting survey work in FSU, discusses these key reliability issues based on empirical data collected in multiple surveys. In particular, non-coverage patterns of preferred sampling methods (area sampling, random-route) and respondent selection (randomized selection vs quota) are analyzed. Specific reasons of non-response are discussed, including those related to demographic and political factors.
Based on own polling experience in Russia and Ukraine, the author suggests possible methodological approaches, in order to approve sample coverage and minimize non-response in nationwide surveys in FSU.
The french longitudinal survey on the integration of first-time arrivals (ELIPA) has two main objectives : the path integration knowledge within the three years following the first residence permit in France and the assessment of the welcome arrangement for new migrants (CAI). The first wave was conducted in 2010 with a representative sample of 6,107 migrants aged 18 or older, nationals of countries outside the European Economic Area and Switzerland. These migrants were reinterviewed in 2011 and the final wave will take place in 2013.
The sampling frame was built according to several targets. Former studies had shown the relevance of individual data such as reason for migration (employment, family, humanitarian), countries of origin, skills in french before migration, region of arrival (Paris' area or others). Another individual data is quite important : the length of the period between arriving in France and getting legal right to settle.
As the whole population to be surveyed is not very important (about 100,000 people) and therefore the polling rate quite low, multi-targeted sampling frame cannot be implemented with exact accuracy.
In a first step, initial targets will be elaborated. Then the iterative process in order to reach them will be presented. It implies technical rules in order to compute non-response correction and calibration. In July 2013, the last data will be collected, allowing comparisons between the target and the achievement. As a conclusion, the practical results and main guidelines for using the survey data will be presented.
The picture of generating a pseudo-population U* as set-valued estimator of the original population U is a illustrative way to improve the users' understanding of the basic concepts of statistical sampling theory such as the Horvitz-Thompson estimator for a total of a variable y under study. In this estimator, where the y-values in the sample are multiplied by the reciprocal of the first-order sample inclusion probabilities of the sample units, a pseudo-population is built by replicating each y-value according to these reciprocals. Then, the estimate for the sum of the y-values in the population is nothing else but the sum of the y-values in this pseudo-population. Furthermore, the accuracy of the estimator under different sampling methods can be explained by the quality of U* as estimator of U with respect to the parameter of interest. The same picture applies for other estimators such as the Hansen-Hurwitz, the ratio or the regression estimator.
The concept of generating pseudo-populations can also be usefully applied to randomized response techniques, where it helps to consider more complex sampling schemes than SIR sampling (cf. Quatember 2012, Statistical Methods & Applications). Furthermore, it can be applied to missing data and the SCI family of methods for statistical disclosure control (Quatember and Hausner 2012, Journal of Applied Statistics). Finite population bootstrap is another example, where this basic tool for the analysis of statistical surveys has already been applied (cf. Sitter 1992, The Canadian Journal of Statistics).
In some sample surveys costs of observing population characteristics may differ from unit to unit. As a result, the total cost of the random sample depends on sample composition and may be highly variable. The total sample cost may also become variable due to using certain cluster sampling or multistage sampling schemes, or using varying data collection modes. When excesses of the variable cost over the planned survey budget are unwelcome, it may be reasonable to employ survey techniques dedicated to control the cost strictly while maximizing the precision of estimates. If the sampling frame contains information that may be useful to establish bounds on maximum sample cost, such a knowledge may lead to more efficient use of budget funds. In this presentation a sample allocation problem for stratified sampling is formulated in such a way that budget excesses are ruled out when per-unit sampling costs are non-homogenous. Possibilities of solving the stated problem using the well-known branch-and-bound approach are explored.