Administrative Records for Survey Methodology 2 |
|
Chair | Dr Asaph Young Chun (US Census Bureau ) |
Coordinator 1 | Professor Mike Larsen (George Washington University) |
Coordinator 2 | Dr Ingegerd Jansson (Statistics Sweden) |
Coordinator 3 | Dr Manfred Antoni ( Institute for Employment Research) |
Coordinator 4 | Dr Daniel Fuss (Leibniz Institute for Educational Trajectories) |
Coordinator 5 | Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories) |
Numerous large-scale surveys conducted around the world supplement their primary data collections with linkages to a variety of administrative sources (e.g. social security records). However, due to the highly sensitive and confidential nature of administrative records, accessing and linking such records to surveys requires agreement from multiple parties, including the administrative data owners, key stakeholders, and in many cases, the survey respondents themselves. In fact, obtaining informed consent from respondents prior to linking their survey and administrative records is often mandated by research ethics boards and/or legal regulations. Not all respondents consent to linkage and some evidence suggests that the proportion of linkage non-consenters is growing over time. Linkage non-consent is problematic in terms of reducing statistical power and possibly introducing bias in linked-data estimation. Several studies have shown that survey and administrative variables are affected by linkage consent bias. Different methods have been used to measure linkage consent bias and different strategies have been proposed for minimizing this source of bias either at the survey design stage or post-data collection. In this presentation, I review these different methods and strategies for measuring and controlling for consent bias in linked data sources. In doing so, I note the strengths and limitations of these approaches and conclude by providing practical guidance to researchers interested in addressing this source of error in their own studies.
Linking survey data with administrative data poses several challenges. In Germany, one of the main obstacles in the data linkage process are strict regulations on data confidentiality. The regulations require explicit consent of the respondent, which is most often gathered as written consent including a signature. Since respondents are cautious when their data records are concerned, usually a substantial part of the sample does not consent. Compared to survey data without linkage, linked data are affected of more sources of selectivity, which could influence data quality.
The proposed paper describes different sources of selectivity and suggests ways of minimizing them. Apart from well-researched sources of bias, which are inherent to the survey process, we suggest to consider three further sources of selectivity, which are more technical and are introduced by the record linkage process.
1. As mentioned above, not every respondent will consent to the use of the administrative records held in their name.
2. Once the respondents have given their consent, they have to be identified in the administrative data.
3. Once they are identified, their records have to be extracted from the administrative data base, and finally
4. the survey data has to be linked to the extracted administrative records of the same person.
With each of these steps there are potential difficulties leading to dropouts and an increase in selectivity of the sample. Measures to reduce selectivity have so far concentrated on the step of consent. They include placement and wording of the consent form, or potential interviewer effects. The topics covered in the survey and trust in the survey agency and administrative institutions have been identified to play a role as well.
However, after the respondent has given consent, there is still no guarantee that their details can be verified. They might not give crucial information needed to identify their administrative records or their handwriting might be illegible. In addition, even once the consent data have been cleaned, it is still possible that the administrative records do not correspond to the survey respondent.
On the other hand, the data extraction might not be possible due to administrative reasons.
Since these, more technical reasons for dropouts and selectivity have so far been overlooked and in order to reduce selectivity, it is important to pay more attention to these technical sources of bias.
Linked survey and administrative data can be used to facilitate richer analyses by augmenting the information collected from the surveys with vital or administrative data. However, the quality of linked data is only as good as the algorithm used to produce them. Linkage methodologies must be rigorous and transparent so that analyses are valid and replicable. The National Center for Health Statistics (NCHS), the principal health statistics agency in the U.S., has a data linkage program that is designed to expand the analytic utility of the Center's population-based surveys. The NCHS Data Linkage Program links its health survey data with vital statistics and administrative data sources. However, there has been a growing reluctance of survey participants to provide personally identifiable information (PII) to interviewers. Therefore, in recent years, changes to survey design have been implemented to reduce the amount of PII collected. This, in turn, has limited the information available for data linkages based on strictly deterministic matching algorithms. To address this issue, the Data Linkage Program at NCHS has altered some of their linkage methodologies to add more probabilistic approaches.
This talk will describe some of the new approaches being used for linking when limited PII is available, such as new match weights to be used in the scoring algorithms. Results will compare new and old methodologies, using actual examples from the NCHS Data Linkage Program. The results will be discussed in terms of implications for analyses and future directions of the Data Linkage Program.