Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Data Quality Management in Cross-National Surveys 2 |
|
Convenor | Dr Jessica Fortin-rittberger (GESIS) |
Coordinator 1 | Dr Christina Eder (GESIS) |
Coordinator 2 | Dr Manuela Kulick (GESIS) |
Achieving data of the highest quality is the aspired goal of all survey programmes. While a growing literature offers guidelines on standards for cross-national research, most of the recommendations are directed at the design phase of surveys. Meanwhile, much less attention has been allocated to the post-data collection stage so far.
Information about data processing from data cleaning to quality checks is rarely available in details and thus not always clear to researchers and analysts. Since little is known about the steps taken in data processing, we have reasons to believe that practices differ between survey programmes. This represents a unique opportunity for collaboration. Contrasting different approaches to data cleaning and quality control will help shed light on each approach's particularities, and by the same token help harness the strengths of each method through comparison of individual positive features and drawbacks. For some survey programmes, this might provide the first steps towards a more systematic approach to data cleaning and quality control.
The proposed session aims to cover this important gap by exposing in details a series of cross-national models of data cleaning and quality control processes, clarifying which procedures are used, and highlighting commonalities and dissimilarities. By confronting different methodologies and experiences of various approaches to data management, it will be possible to engage in a discussion, and most important, a critical evaluation of the processes employed by each approach: highlighting strengths, finding ways to minimize weaknesses, and suggesting strategies for possible improvements. This discussion could contribute to establishing guidelines of best practice in this field.
While the session is primarily geared towards the process of post-collection data management in cross-national programmes we also welcome papers that focus on national or subnational surveys.
In 2012 GESIS will launch two electronic resources to assist social researchers. The website DataCoH (Data Coding and Harmonization) will provide a centralized online library of data coding and harmonization for existing variables to increase transparency and variable replication. The software program Charmstats (Coding and Harmonizing Statistics) will provide a structured approach to data harmonization by allowing researchers to: 1) download harmonization examples; 2) document variable coding and harmonization processes; 3) access variables from existing datasets for harmonization; and 4) create harmonization projects for publication and citation. This paper explains DataCoH and Charmstats and demonstrates how they work.
In cross-national surveys like the European Social Survey, national resource asymmetries at the post-collection stage, represent potential challenges to comparability: Time, personnel and skills allocated to data control and data editing varies from country to country. Moreover, data collectors and data producers have different traditions of processing data. Thus, data deposited to the archive fall short of being homogeneous with respect to control and editing. This is one of the main challenges to the overarching goal of providing end-users with data that are as standardised and harmonised as possible, while reflecting the original reliability and quality of the data.
Ideally, then, in repeated cross-national surveys, processing activities should be integrated both vertically (between surveys) and horizontally (between countries within one survey). This presentation will focus on the horizontally integrated data processing of the European Social Survey, where the ESS Data Archive's standardised data processing procedures will be presented in detail, deploying the previous round of the survey, ESS round 5, as use case.
Modern national and cross-national surveys are complex products that withstand shallow analyses. Here are some examples, past and recent. Analysing data from the ESS this author found a dubious result. The outcomes of separate analyses for men and women were somewhat inconclusive. That situation became clear after checking for data collection mode: respondents had a choice between self-completion or interviewer assisted. Women choose much more often self-completion than men. Thus the inconsistencies where due to a "hidden in the documentation" characteristc of these data.
Second example, some years ago researchers presented analyses about developments in East Germany right after the changes in 1989/90. They found a huge difference between the first and the second study (one right after and the other one year later the events). However, both studies were not comparable because the first one used a quota sample designe while the second a full probability design. The presenters were not aware of the fact because, again, these facts were documented but obiously not perceived by researchers.
This paper emphasises two issues: a. proper documentation of surveys according to modern documentation standards and b. increasing awareness and knowledge of users that they need to understand the complexity of surveys, manage the data for their analyses accordingly by, for instance applying appropriate weights, or adjusting for mode differences, or apply proper missing data definitions.