ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Thursday 20th July, 11:00 - 12:30 Room: F2 102


Administrative Records for Survey Methodology 4

Chair Dr Asaph Young Chun (US Census Bureau )
Coordinator 1Professor Mike Larsen (George Washington University)
Coordinator 2Dr Ingegerd Jansson (Statistics Sweden)
Coordinator 3Dr Manfred Antoni ( Institute for Employment Research)
Coordinator 4Dr Daniel Fuss (Leibniz Institute for Educational Trajectories)
Coordinator 5Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories)

Session Details

Incorporation of administrative records have long been regarded as a way of improving the quality and interpretability of surveys and censuses and of controlling the rising cost of surveys (Chun and Scheuren, 2011). The increasing number of linked datasets, such as Health and Retirement Study in the U.S., National Educational Panel Study in Germany, and Understanding Society in UK, are accompanied by growing empirical evidence about the selectivity of linked observations. The extent and pace of using administrative data varies from continent to continent and from country to country. This is partly due to differential concerns about privacy, confidentiality, and legal constraints, as well as variability in acceptance and implementation of advances in statistical techniques to control such concerns.

The primary goal is to control data quality and reduce total survey error. This session will feature papers that implement "total administrative records error" and “total linked data error” methods and provide case studies and best practices of using administrative data tied to the survey life cycle (Chun and Larsen, a forthcoming Wiley book). The session invites papers that discuss fundamental challenges and recent advancements involved in the collection and analysis of administrative records, integration with surveys, censuses, and auxiliary data. We also encourage submission of papers discussing institutional collaboration on linked data, sustainable data access, provision of auxiliary tools and user support. For example, papers in this session include, but are not limited to the following topics:


1.Innovative use of administrative data in household surveys and censuses to improve the survey frame, reduce nonresponse follow-up, and assess coverage error.

2.Quality evaluations of administrative data and quality metrics for linked data

3.Recent advancements in processing and linking administrative data with survey data (one-to-one) and with multiple sources of data (one-to-many).

4.Recent methods of disclosure limitation and confidentiality protection in linked data, including linkages with geographical information.

5.Bayesian approaches to using administrative data in surveys, censuses, small area estimation, and nonresponse control.

6.Implementation of new tools that facilitate the use of linked data by simplifying complex data structures or handling inconsistent information in life-course data

7.Strategies for developing and maintaining a user-friendly infrastructure for the analysis and dissemination of linked data and solutions for collaboration

8.Applications that transform administrative data into information that is useful and relevant to policymaking in public health, economics, science and education.

Paper Details

1. Level of education – measuring the quality of questions in survey interviews by administrative records on education. Experiences from the Norwegian European Social Survey 2004 - 2014.
Mr Øyvin Kleven (Statistics Norway)
Professor Kristen Ringdal (Department of Sociology and Political Science, Norwegian University of Science and Technology)

Non-sampling errors in household surveys have received considerable attention in the past
decades, as these errors clearly have become more and more critical to the accuracy of survey based statistics. For many surveys, measurement errors are the most damaging source of error. It is well documented in the text books on survey methodology that there are many pitfalls in obtaining an accurate response from a survey question. Administrative register data provide an opportunity to study the impact on measurement errors on the survey estimate. In the European Social Survey (an academically driven survey who runs every second year) we ask the respondents to report their level of education. Level of education is an important sociodemographic variable in social statistics and in the social sciences. It is well known from previous literature that estimates of level of education based on surveys can be affected by measurement errors. Measurement errors are errors that occur during data collection and cause the recorded values of variables to be different from the true ones.Their causes are commonly categorized as: Survey instrument: the form, questionnaire or measuring device used for data collection may lead to the recording of wrong values. Respondent: respondents may, consciously or unconsciously, give erroneous data; Interviewer: interviewers may influence the answers given by respondents. Measurement errors may be difficult to detect unless they lead to illogical or inconsistent responses. In Norway we have a register containing most inhabitant’s level of education. But due to data regulations we are not allowed to merge and use this register straightforward as a variable in The European Social Survey. However we are allowed for methodological purposes to merge level of education from administrative records for each respondent in the survey and compare the two responses. Hence we have two different sources for each respondent measuring level of education. For each round of The European Social Survey from 2004 – 2014 we can use this information to study response quality. This paper is a follow up of a paper delivered in 2006. We now have a much richer data material and more substantial insight to the topic.


2. Using Administrative Records to Evaluate Absolute and Relative Reporting Accuracy in Surveys
Ms Joanne Pascale (US Census Bureau)

Before and after implementation of historic health reform in the U.S. in 2014, the research and policy communities studied ways to measure whether and how individuals obtain coverage. Conventional sources (employers, government programs and plans purchased directly from insurance companies) remained intact after reform, and the new marketplace (aka.Obamacare) introduced a new means of obtaining private coverage, sometimes with subsidies. While reform reduced the number without coverage substantially, there are still non-trivial numbers of uninsured. Due to the patchwork of pubic and private sources of coverage, surveys represent the only means of deriving a measure of those without coverage. Thus survey misreporting of all coverage types needs to be considered collectively when assessing the accuracy of the uninsured estimate. Accurate measurement of coverage type is also important in and of itself, as the focus of measurement error in health insurance coverage may well shift from uninsured to source of coverage (public versus private). Decades of research on measurement error pre-dating health reform indicate that reports of specific coverage type are problematic. Examples include respondents who report the same coverage twice but as two different types of coverage; report the wrong type of public coverage due to the similarity of program names (e.g., Medicare and Medicaid); or report public coverage as private and vice versa. Rigorous validation studies – where survey reports are matched with some kind of outside “truth” source on coverage type – are relatively rare. One type of public program (Medicaid primarily for low-income) has been studied extensively (in part due to a centralized, national database of enrollees), but other coverage types lack this kind of database. Thus, little is known about how misreporting of one type of coverage affects other types of coverage.

To address this gap, a reverse record-check study was conducted using administrative records of individuals enrolled in five different types of coverage – both public and private (including the marketplace) supplied by a private health insurer. Sample was randomly assigned to one of two questionnaires widely used in government surveys to monitor the U.S. healthcare system. Person-level matching was then conducted between the survey data and the enrollment records, and the agreement between the records and the survey report was established for multiple indicators – insured/uninsured, type of coverage, and time period of coverage. This study design allowed us to expand the scope of analysis beyond under-reporting (aka sensitivity) – that is, the number with a known coverage type for whom that coverage type was not reported – to include a measure of over-reporting (aka predictive power) – the number who report a coverage type but it cannot be validated in the records. We also compared true “population” prevalence according to the records to the survey estimate. Thus we were ultimately able to assess under-reporting, over-reporting and prevalence accuracy in two ways – both absolute accuracy (records versus survey) and relative accuracy (absolute accuracy across the two surveys).


3. Record linkage and anonymization. What impact in data quality?
Ms Filipa Ribeiro (INE)
Mr Pedro Cunha (INE)
Mr António Portugal (INE)

The paradigm shift from a traditional census model to one using administrative data sources, with linkage processes between registries and anonimization of information, play a key role in the construction of a Resident Population Database. The construction of one database with some of the census variables based on information dispersed by different administrative data sources depends on the ability to integrate records from various sources with different periods and the capacity to determine the data quality.

One of the starting points is the linkage between the 2011 Censuses and each one of the administrative data sources to allow a connection between the data from different entities of the Public Administration and the files from statistical surveys. The integration of different data structures, encodings and formats from various sources, for different years (2011 to 2015), it is a complex process and it is necessary to make the information compatible through a set of standardization rules of the variables. This process is essential to ensure good results in record-linkage processes and increase the level of information quality.

One of the key points found in the construction of a population database based on administrative information and in the linking data is the absence of a common identifier to the different administrative sources. Additionally, in some cases, the low rates of the identifiers filling, even that allow some connections, therefore it is necessary the use of matching processes through keys made of combined attributes related to the characteristics of the statistical units under analysis.

In order to determine the viability of this process of linking the administrative data sources, a pilot field test is carried out, with connections between the records of some individuals of the Resident Population Database and their georeferenced housings, to obtain the data quality and to study the possibility of this process being applied in the 2021 Censuses.

Over all the process, the confidence of the administrative authorities in the capacity of National Statistics Institutes deal with the anonymization process of data is a key factor. The anonymization process implemented, non reversible, and common to all the entities involved is made even before the data leaves the responsible authority.