Representing the population: Improving European sampling practices 1 |
|
Chair | Dr Annette Scherpenzeel (SHARE – Survey of Health, Ageing and Retirement in Europe ) |
Coordinator 1 | Mrs Johanna Bristle (SHARE – Survey of Health, Ageing and Retirement in Europe) |
Coordinator 2 | Dr Stefan Zins (ESS-GESIS) |
Demographers, statisticians and researchers in general are becoming increasingly interested in the use of national population registers as data sources and sampling frames. In the former case, the advantages of population registers lie in the availability of various information on each record; in the latter case, the strong point of registers is their coverage of the population, which should enable to randomly extract representative samples, reducing biases. These advantages of the population registers had already been largely acknowledged; yet, the advances in digitalization finally made the use of the registers more feasible.
Despite variation in national laws, some supranational principles of data protection - the Fair Information Principles - have been established, at least at the European level. Aims of these guidelines are manifold: on the one side, to safeguard the data subjects; on the other side, to foster the creation of reliable and up-to-date registers. Yet, the situation seems to vary largely between countries.
In this work, we aim at reviewing the availability, accessibility and quality of population registers in the European countries, basing on the existing literature. After defining what population registers are, we provide a picture of the situation of population registers in Europe. We find and discuss some inconsistencies between different papers; in general, it looks like most of the European countries have a population register, although they vary on the degree of centralization. When it comes to access, there are persisting differences between countries. Judicial and language problems, for instance, still limit the access to registers. Furthermore, the issue of quality of registers is addressed: different indicators can be considered; also in this case, differences between countries are present. Overall, Nordic countries appear to be examples of best practices, with a longer tradition of registers, a well-functioning update system and the possibility to match information from different registers.
This review is meant to be a preliminary step to investigating how survey research programs are able, in practice, to use population registers as sampling frames and sources of auxiliary data. These possibilities have not been extensively investigated in the literature so far; however, they currently represent one of the key challenges of survey research programs for the future.
Obtaining good probability samples is a key challenge for European cross-national studies in order to represent the population. The availability of population registers that can be used as sampling frames varies a lot across countries, as do the regulations about who can or can’t access the registers and what information can be obtained from them. We will present a comprehensive overview of the sampling frames which are used in the four cross-European surveys cooperating in SERISS: the European Social Survey (ESS), the European Values Study (EVS), the Gender and Generations Program (GGP), and the Survey of Health, Ageing, and Retirement in Europe (SHARE). The overview also includes the availability of auxiliary variables in these sampling frames that might be useful for nonresponse analyses.
The overview will show in which European countries the four studies are all using one and the same population register for their samples, opening up possibilities to jointly build and share sampling frames. Furthermore, the overview enables the survey teams to profit from each other’s experience because it demonstrates that in some countries there are studies not using an existing population register whereas this register appears accessible for other studies in the same country. Finally, it lists countries having a population register which is not used as a sampling frame in any of the four studies. A joint effort of the sampling experts of all studies could possibly lead to an improvement in the accessibility of the registers in these countries.
The result of this SERISS project provides a valuable knowledge database of national sampling procedures and accessible administrative data across Europe and in addition offers a way to improve harmonisation of sampling frames and sample data across European surveys.
Data linkage can provide a good empirical basis to assess some elements of the TSE error framework (Groves and Lyberg 2010). Our paper is based on data from the 2015 post-electoral Swiss electoral study (Selects) merged with validated register income data from the Swiss social security system. The initial random sample of 12’300 individuals with the right to vote in Switzerland was provided by the Swiss Federal Statistical Office (FSO). For all sampled citizens, different kinds of information were available. Notably, socio-demographic information from the FSO sampling register (gender, birthyear, household size, marital status, country of birth, canton, community number). This information was then enriched with income data from the Swiss social security system, which includes several elements: AVS (old age and survivors pension scheme), AI (invalidity insurance), APG (allowance for loss of earning), and AC (unemployment insurance). Detailed income information was provided for the four above mentioned elements for all family members of the sampled persons. Overall, 5’337 individuals responded during a field period of five weeks after the 2015 Swiss Federal elections. The survey was conducted in a sequential mixed mode, starting with web and adding a telephone component after two weeks. In total, 82% of the interviews were conducted online and 18% by telephone. This unique dataset allows us to study several components of the TSE framework and to address several questions: 1) Do non-respondents to the post-electoral survey (unit non-response) present a different distribution of income compared to respondents, and what is the influence of income on total non-response, compared to other known parameters? 2) Do non-respondents to the survey income question (item non-response) present a specific distribution and composition (unemployment, invalidity, etc.) of income? Other questions address measurement error components: 3) Which parameters, including social, demographic or political indicators, as well as mode effects, have an impact on the difference between declared and register values concerning household income? 4) What is the influence of this departure from validated data on the study of some political indicators? In this paper we will present some preliminary answers to the above mentioned questions in the Swiss context, using multivariate analysis to disentangle the impact of different families of explanatory factors.