Administrative Records for Survey Methodology 5 |
|
Chair | Dr Asaph Young Chun (US Census Bureau ) |
Coordinator 1 | Professor Mike Larsen (George Washington University) |
Coordinator 2 | Dr Ingegerd Jansson (Statistics Sweden) |
Coordinator 3 | Dr Manfred Antoni ( Institute for Employment Research) |
Coordinator 4 | Dr Daniel Fuss (Leibniz Institute for Educational Trajectories) |
Coordinator 5 | Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories) |
Recent projects have linked administrative microdata to surveys that provide key information to policy makers, such as the Current Population Survey and the American Community Survey. This chapter reviews how combining administrative and survey data can improve the information on which policy makers base their decisions, both indirectly thorough improvements of survey accuracy and by directly using linked data to examine policy relevant questions. We first provide an overview of how linked survey and administrative datasets can be used to improve surveys by reducing survey error. Linked data can help to assess and potentially correct errors and biases arising from coverage error, unit and item nonresponse, imputation and measurement error. We review the evidence on each error source in turn, drawing examples from our work measuring program receipt and the income distribution. We discuss weighting, imputation, and direct substitution as possible solutions to the identified problems. The second part of this chapter discusses how linked data can help to examine policy relevant questions directly. We review the role linked data can play in studying the effects of social insurance and government transfer programs. We focus on the effects of program features on program participation, the income distribution and labor supply.
As the principal health statistics agency for the U.S., the National Center for Health Statistics (NCHS) is responsible for collecting accurate, relevant, and timely data related to health. The mission of NCHS is to provide statistical information that can be used to guide actions and policies to improve the health of the American people. In addition to collecting and disseminating the Nation’s official vital statistics, NCHS conducts several population-based surveys, including the National Health Interview Survey and the National Health and Nutrition Examination Survey, and establishment surveys of health-care facilities, including the National Hospital Care Survey. The data collected through these surveys allow NCHS to publish widely-used, reliable statistics regarding the health status of the U.S. population and selected subgroups.
The data also provide the opportunity to identify disparities in health status and use of health care services by demographic, socioeconomic status, and other population characteristics; describe experiences with the health care system; monitor trends in health status and health care delivery; and evaluate the impact of health policies and programs.
There are many questions that health surveys cannot answer on their own, in part because they often only represent a snapshot in time. In addition, most population-based health surveys rely on respondent reports and, thus, are limited by respondent recall. However, when these data are linked with vital statistics or administrative data, analysts can gain insight into outcomes such as mortality or health care utilization, and methodological issues such as accuracy of respondent-reporting. Thus, data linkages enhance the analytic capabilities and scientific value of health surveys. Over the years, NCHS has developed a data linkage program to link its health survey data with vital statistics data sources, including the National Death Index, and administrative data sources, including federal and state benefit programs. Although administrative data are not created for research purposes (they are created primarily for program administration), the NCHS Data Linkage Program has worked extensively with partner agencies to develop data files that can be used for research.
This talk will describe the NCHS Data Linkage Program and how the linked data have helped to inform policy research.
Administrative data are linked the Survey on Health Aging and Retirement in Europe (SHARE) in Germany in order to enrich the survey with selected information from the pension insurance records. This opportunity is increasingly often used. However, the survey data SHARE enable researchers also to evaluate the quality of data from the records. These evaluations are able to show that both sources have particular strengths and weaknesses. Data about gross income are by far more reliable if they come from pension insurances registers. Missings are less common in the registers and the quality of data is superior. While persons tend to round up their income in surveys, records give the number accurately as calculated and used in the official process. Survey data are thus superior in quality concerning the level of school and professional education. Persons are obviously better informed about their education level than their employers from which the record information stems. The papers presents results from the project SHARE-RV which links data from three waves of the survey SHARE with anonymised data from the German Pension Insurance records.
This study seeks to shed light on the possible reasons for inconsistent findings on predictors of linkage consent, as documented in the literature. To this end, we compare two very similarly structured datasets from the same country. In the two datasets, both of which were collected in surveys conducted by the same polling institute, workers in different establishments were asked questions about work-related aspects relevant for social science research. We first use the same set of controls for both datasets, thereby confirming that the two studies are broadly comparable. Secondly, we add further variables to the datasets to see whether varying the set of controls gives rise to inconsistent results. These additional variables are not necessarily available in both datasets and include psychological attributes as well as job and firm characteristics. In a subsequent step, we make use of the matched employer-employee structure of the available data.
Here, we want to answer the question whether one would have obtained similar results if an analysis had not been restricted to the sample of respondents who provided linkage consent. Suppose our target population consists of all survey participants regardless of their linkage consent decision. Our sample consists of respondents from whom linkage consent was obtained. Applied researchers are mostly concerned whether one can use the sample at hand to derive consistent estimators for statistics of the target population without strong assumptions. This is the case if the association between linkage consent decision and the outcome of interest depends only on observable characteristics. In this case, it is possible to derive consistent population statistics from the sample by adding these observable characteristics to the regression or by weighting. In constrast, it is necessary to make stronger assumption, if unobserved heterogeneity is correlated with linkage consent (and the outcome), stronger assumptions.
We investigate this question by testing whether results for two economic models would have been different if information on individuals who refused to provide linkage consent had in fact been available. In the first model, we estimate an augmented Mincer-regression, a "cornerstone of empirical economics", to see whether different samples give different findings on wage returns to human capital investments. The second model is a replication of our own earlier research on participation in job-related training. In the original study, we excluded information on survey participants who did not provide linkage consent. This is because we made use of linked data only. In this study we replicate our original results using the survey data only. We investigate whether we would have come to different conclusions if we had included information relating to the sample of survey participants who did not provide linkage consent.
In general, our results of the role of (denied) linkage consent for applied research are rather promising. Non-consent does not seem to translate into a large bias in economic models in our two applications. Considering the non-consent sample provides us with virtually the same results.