Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Weighting: approach and sources 2 |
|
Convenor | Mrs Kim De Cuyper (GfK EU3C) |
Coordinator 1 | Mrs Griet Gys (Significant GfK) |
Coordinator 2 | Mrs Christine Tresignie (GfK EU3C) |
There are numerous reasons why a given sample may not be representative. The main reasons for this are uncontrollable deviations from randomness which may arise from numerous sources: systematic non response, deficient address material, interviewer bias, heterogeneous contact probabilities, etc. Moreover, it will rarely occur that the raw data 100% match the population, especially since data validation and data cleaning also impact this. To eliminate bias as much as possible and to correspond to the population, weighting is applied. Weighting is a complex process and goes through various phases in which the approach towards weighting as well as the sources are key and determine the validity of the weighting.
In terms of approach a distinction can be made between:
- types of weight: probability or design, post-stratification or non-response, national, population, etc.
- single stage or multi stage weighting
- variables to weight on
- method: iterative proportionate fitting (IPF, a.k.a. Rim Weighting or Raking), linear weighting, etc.
- software package: SPSS, Quantum, etc.
- trimming
- imputation of missing data
The sources for weighting are utmost important and can divided in two main groups
- public vs private data
- type of data
o individuals vs households in consumer surveys
o workers vs establishments (in -or excluding group structures) in business surveys
A well designed weighting procedure results in a high weighting efficiency and a high effective sample percentage.
Therefore this session invites papers looking into one or more of the different weighting angles as per above. Paper givers are invited to send in an abstract of no longer than 1000 words.
Collecting data on wages is central to socio-economic research. However, besides high rates of people not answering wage-related questions, measurement issues are also relevant. Most data from official statistics are too aggregated to allow for detailed individual-level analyses which are crucial for encouraging innovative political-economic ideas in the long run.
In this context, web surveys seem to offer various advantages, such as worldwide coverage, cost benefits and a fast data collection process. Especially for sensitive questions, like income, they might provide more reliable results because social desirability effects can be eliminated. While web surveys could represent a good supplement to official statistics data, they pose many methodological challenges. A core problem concerns the representativeness of the data as the sub-population with Internet access might be quite specific.
Accordingly, the main aim of this paper is to compare existing and develop new calibration methods to enhance the representativeness of different types of web surveys. The characteristics of a probability-based (LISS panel) and a non-probability based (WageIndicator) web survey on labour market- related topics for the Netherlands are compared with population data from Statistics Netherlands providing a detailed bias description of core variables related to wages for both samples. For a selection of core variables different adjustment models are applied, such as simple weighting, propensity score adjustment and the Maxent approach. Finally the properties and theoretical advantages and disadvantages of the methods are discussed exploring the potentials and constraints of different adjustment methods for probability
Web panels in the USA routinely draw samples based on demographic parameters released by the US Census Bureau in an attempt to improve sample representativeness of the general population. Two common approaches to building census-balanced samples are: (a) "outbound" balancing whereby email invitations are fielded to a sample of panelists whose profile demographics match census targets, and the completed sample is then further adjusted with post-stratification weights; vs. (b) "inbound" balancing whereby quota screeners are applied when panelists start the survey, such that the completed sample will closely match census targets; and no post-hoc weighting is needed. We evaluate two samples drawn from the same web panel using "outbound" vs. "inbound" census-balancing. First, we compare weighted and unweighted sample estimates to census benchmarks for home and vehicle ownership, household size, relocation, and health insurance status; benchmarks from the US FDIC for choice of financial institutions and bank accounts, as well as sample estimates from Pew surveys on mobile and digital divide. Next, we test concurrent validity of attitudes and behaviors relating to adoption of technology and choice of financial institutions. Finally, we explore sample differences on selected profile variables (with imputations), and examine whether expanded weighting algorithms incorporating profile variables would reduce or augment sample differences. Practical considerations such as cost differential and length of field period will also be reviewed. We will conclude with which sample performed better on which dimensions and possible reasons for those differences; as well as recommendations for future research.
A widespread method for forecasting economic macro level parameters such as GDP growth rates are survey-based indicators which contain early information in contrast to official data. But surveys are commonly affected by nonresponding units can cause biased results. For this reason, we analyse and impute the missing observations in the Ifo Business Survey, a large business survey in Germany. To reflect the underlying latent data generating process, we compare different imputation approaches for longitudinal data. After this, the microdata are aggregated and the results are compared with the original indicators to evaluate their implications at the macro level.