ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Wednesday 19th July, 16:00 - 17:30 Room: F2 102


Administrative Records for Survey Methodology 3

Chair Dr Asaph Young Chun (US Census Bureau )
Coordinator 1Professor Mike Larsen (George Washington University)
Coordinator 2Dr Ingegerd Jansson (Statistics Sweden)
Coordinator 3Dr Manfred Antoni ( Institute for Employment Research)
Coordinator 4Dr Daniel Fuss (Leibniz Institute for Educational Trajectories)
Coordinator 5Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories)

Session Details

Incorporation of administrative records have long been regarded as a way of improving the quality and interpretability of surveys and censuses and of controlling the rising cost of surveys (Chun and Scheuren, 2011). The increasing number of linked datasets, such as Health and Retirement Study in the U.S., National Educational Panel Study in Germany, and Understanding Society in UK, are accompanied by growing empirical evidence about the selectivity of linked observations. The extent and pace of using administrative data varies from continent to continent and from country to country. This is partly due to differential concerns about privacy, confidentiality, and legal constraints, as well as variability in acceptance and implementation of advances in statistical techniques to control such concerns.

The primary goal is to control data quality and reduce total survey error. This session will feature papers that implement "total administrative records error" and “total linked data error” methods and provide case studies and best practices of using administrative data tied to the survey life cycle (Chun and Larsen, a forthcoming Wiley book). The session invites papers that discuss fundamental challenges and recent advancements involved in the collection and analysis of administrative records, integration with surveys, censuses, and auxiliary data. We also encourage submission of papers discussing institutional collaboration on linked data, sustainable data access, provision of auxiliary tools and user support. For example, papers in this session include, but are not limited to the following topics:


1.Innovative use of administrative data in household surveys and censuses to improve the survey frame, reduce nonresponse follow-up, and assess coverage error.

2.Quality evaluations of administrative data and quality metrics for linked data

3.Recent advancements in processing and linking administrative data with survey data (one-to-one) and with multiple sources of data (one-to-many).

4.Recent methods of disclosure limitation and confidentiality protection in linked data, including linkages with geographical information.

5.Bayesian approaches to using administrative data in surveys, censuses, small area estimation, and nonresponse control.

6.Implementation of new tools that facilitate the use of linked data by simplifying complex data structures or handling inconsistent information in life-course data

7.Strategies for developing and maintaining a user-friendly infrastructure for the analysis and dissemination of linked data and solutions for collaboration

8.Applications that transform administrative data into information that is useful and relevant to policymaking in public health, economics, science and education.

Paper Details

1. When Education Survey Data Come From Multiple Sources
Mr Peter Siegel (RTI International)
Mr Darryl Creel (RTI International)
Dr James Chromy (RTI International, retired)

As the use of administrative data in education surveys increases, we need to think through and deal with the associated issues. A challenging situation arises when education survey data come from students, and administrative data are used to supplement sample survey data. A useable case rule must be developed to identify key information items or combinations of these items needed to qualify as a unit respondent. A unit respondent in a survey is typically an interview respondent, but we will define a unit respondent based on data, regardless of the source. The useable case rule can sometimes be satisfied with data from only a subset of the sources and can exclude interview data. This presentation uses data from the 2007-08 National Postsecondary Student Aid Study (NPSAS:08) to examine tradeoffs between unit respondent sample size and data record completeness, when defining unit respondents.
There are at least two approaches to defining a unit respondent when data from administrative sources exist in addition to survey responses. One approach is to define a unit respondent as a sample member completing the survey interview under some rule for determining what constitutes ‘complete’. This approach is most common, and unit response rates computed this way follow the American Association for Public Opinion Research (AAPOR) definition and the United States National Center for Education Statistics (NCES) Statistical Standards. Nonresponse weight adjustments are used to compensate for the nonrespondents and to reduce the potential for unit nonresponse bias. The data collected from other sources may then be used to fill in any missing data item values for these respondents. Another approach is to define a unit respondent as a sample member with sufficient data from any source to be judged complete. Filling in data using other sources, when available, and using logical and statistical imputations are all used to compensate for missing data and to reduce the potential for item nonresponse bias. When an interview respondent is used, there is the potential for a large amount of unit nonresponse bias due to interview nonresponse, if the response rate is low. When a useable case respondent is used, there is the potential for a large amount of item nonresponse bias due to missing items, especially when a subset of items is only available from one source, such as the interview.
In this presentation, we will compare these two approaches. Of particular interest are the approaches’ different use of weight adjustment and imputation to compensate for nonrespondents and reduce potential nonresponse bias. While weighting and imputation have been compared in the past, we will examine this comparison in the context of education data when administrative data are available, allowing more imputation and less weight adjustment. We will discuss the advantages and disadvantages of both approaches, as well as the potential concerns with the adoption and use of a unit respondent definition without requiring an interview.


2. Using ‘black box’ commercial databases to reduce costs in high quality UK sample surveys
Mr Joel Williams (Kantar Public)

The UK has several commercial database companies that synthesise multiple sources of data (surveys, administrative databases, plus other sources from the burgeoning ‘big data’ ecosystem) to impute address-level and household-level characteristics. But how accurate are they, and can they be used to improve the efficiency of survey sample designs?
In this paper, we describe a project in which a large, very high quality random sample survey dataset is used to verify commercial imputations with regard to household type, size, and age profile. After providing key descriptive statistics with regard to the imputations’ sensitivity and specificity, we then use this information to simulate optimal sample designs for hypothetical but realistic objectives such as (i) a sample of households with children aged 0-4; (ii) a sample of households renting in the private sector; and (iii) a sample of people aged 75+. These designs take into account real costs, likely response rates as a function of data collection method, and the statistical consequences of varying sampling fractions between strata. We finish with a set of general conclusions about the usefulness of commercial databases in the design of high quality UK sample surveys.


3. Using Administrative Records and Parametric Models in 2014 SIPP Imputations
Dr Joanna Motro (US Census Bureau)
Dr Jason Fields (US Census Bureau)
Dr Gary Benedetto (US Census Bureau)
Dr Veronica Roth (US Census Bureau)

The Survey of Income and Program Participation (SIPP) was redesigned for the 2014 panel. With this redesign came the opportunity to use new modeling methods along with administrative records to improve imputations in the SIPP. As an initial step toward this transition, this methodology was applied to select, high-level branching variables that we have called ‘topic flags.’ Topic flags indicate whether a certain section of questions (e.g. about Social Security receipt) were relevant for a respondent. Topic flags summarize screener questions and monthly-level data to an annual indicator of employment, social insurance programs, means-tested programs, health insurance, and more. For missing data, topic flags are imputed using a parametric method called Sequential Regression Multivariate Imputation (SRMI). As opposed to hot-deck imputation that can only control for a limited number of characteristics, SRMI can control for many more variables. The variables used in the model-based imputation can also come from household, spouse, or parent characteristics. Moreover, our models include data from administrative records, which helps to mitigate the problem of survey data not “missing at random.” This paper describes our modeling process, its advantages over more traditional imputation methods like hot-deck imputation, and demonstrates the usefulness of linking administrative data into the models.