Administrative Records for Survey Methodology 4 |
|
Chair | Dr Asaph Young Chun (US Census Bureau ) |
Coordinator 1 | Professor Mike Larsen (George Washington University) |
Coordinator 2 | Dr Ingegerd Jansson (Statistics Sweden) |
Coordinator 3 | Dr Manfred Antoni ( Institute for Employment Research) |
Coordinator 4 | Dr Daniel Fuss (Leibniz Institute for Educational Trajectories) |
Coordinator 5 | Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories) |
Non-sampling errors in household surveys have received considerable attention in the past
decades, as these errors clearly have become more and more critical to the accuracy of survey based statistics. For many surveys, measurement errors are the most damaging source of error. It is well documented in the text books on survey methodology that there are many pitfalls in obtaining an accurate response from a survey question. Administrative register data provide an opportunity to study the impact on measurement errors on the survey estimate. In the European Social Survey (an academically driven survey who runs every second year) we ask the respondents to report their level of education. Level of education is an important sociodemographic variable in social statistics and in the social sciences. It is well known from previous literature that estimates of level of education based on surveys can be affected by measurement errors. Measurement errors are errors that occur during data collection and cause the recorded values of variables to be different from the true ones.Their causes are commonly categorized as: Survey instrument: the form, questionnaire or measuring device used for data collection may lead to the recording of wrong values. Respondent: respondents may, consciously or unconsciously, give erroneous data; Interviewer: interviewers may influence the answers given by respondents. Measurement errors may be difficult to detect unless they lead to illogical or inconsistent responses. In Norway we have a register containing most inhabitant’s level of education. But due to data regulations we are not allowed to merge and use this register straightforward as a variable in The European Social Survey. However we are allowed for methodological purposes to merge level of education from administrative records for each respondent in the survey and compare the two responses. Hence we have two different sources for each respondent measuring level of education. For each round of The European Social Survey from 2004 – 2014 we can use this information to study response quality. This paper is a follow up of a paper delivered in 2006. We now have a much richer data material and more substantial insight to the topic.
Before and after implementation of historic health reform in the U.S. in 2014, the research and policy communities studied ways to measure whether and how individuals obtain coverage. Conventional sources (employers, government programs and plans purchased directly from insurance companies) remained intact after reform, and the new marketplace (aka.Obamacare) introduced a new means of obtaining private coverage, sometimes with subsidies. While reform reduced the number without coverage substantially, there are still non-trivial numbers of uninsured. Due to the patchwork of pubic and private sources of coverage, surveys represent the only means of deriving a measure of those without coverage. Thus survey misreporting of all coverage types needs to be considered collectively when assessing the accuracy of the uninsured estimate. Accurate measurement of coverage type is also important in and of itself, as the focus of measurement error in health insurance coverage may well shift from uninsured to source of coverage (public versus private). Decades of research on measurement error pre-dating health reform indicate that reports of specific coverage type are problematic. Examples include respondents who report the same coverage twice but as two different types of coverage; report the wrong type of public coverage due to the similarity of program names (e.g., Medicare and Medicaid); or report public coverage as private and vice versa. Rigorous validation studies – where survey reports are matched with some kind of outside “truth” source on coverage type – are relatively rare. One type of public program (Medicaid primarily for low-income) has been studied extensively (in part due to a centralized, national database of enrollees), but other coverage types lack this kind of database. Thus, little is known about how misreporting of one type of coverage affects other types of coverage.
To address this gap, a reverse record-check study was conducted using administrative records of individuals enrolled in five different types of coverage – both public and private (including the marketplace) supplied by a private health insurer. Sample was randomly assigned to one of two questionnaires widely used in government surveys to monitor the U.S. healthcare system. Person-level matching was then conducted between the survey data and the enrollment records, and the agreement between the records and the survey report was established for multiple indicators – insured/uninsured, type of coverage, and time period of coverage. This study design allowed us to expand the scope of analysis beyond under-reporting (aka sensitivity) – that is, the number with a known coverage type for whom that coverage type was not reported – to include a measure of over-reporting (aka predictive power) – the number who report a coverage type but it cannot be validated in the records. We also compared true “population” prevalence according to the records to the survey estimate. Thus we were ultimately able to assess under-reporting, over-reporting and prevalence accuracy in two ways – both absolute accuracy (records versus survey) and relative accuracy (absolute accuracy across the two surveys).
The paradigm shift from a traditional census model to one using administrative data sources, with linkage processes between registries and anonimization of information, play a key role in the construction of a Resident Population Database. The construction of one database with some of the census variables based on information dispersed by different administrative data sources depends on the ability to integrate records from various sources with different periods and the capacity to determine the data quality.
One of the starting points is the linkage between the 2011 Censuses and each one of the administrative data sources to allow a connection between the data from different entities of the Public Administration and the files from statistical surveys. The integration of different data structures, encodings and formats from various sources, for different years (2011 to 2015), it is a complex process and it is necessary to make the information compatible through a set of standardization rules of the variables. This process is essential to ensure good results in record-linkage processes and increase the level of information quality.
One of the key points found in the construction of a population database based on administrative information and in the linking data is the absence of a common identifier to the different administrative sources. Additionally, in some cases, the low rates of the identifiers filling, even that allow some connections, therefore it is necessary the use of matching processes through keys made of combined attributes related to the characteristics of the statistical units under analysis.
In order to determine the viability of this process of linking the administrative data sources, a pilot field test is carried out, with connections between the records of some individuals of the Resident Population Database and their georeferenced housings, to obtain the data quality and to study the possibility of this process being applied in the 2021 Censuses.
Over all the process, the confidence of the administrative authorities in the capacity of National Statistics Institutes deal with the anonymization process of data is a key factor. The anonymization process implemented, non reversible, and common to all the entities involved is made even before the data leaves the responsible authority.