Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Using Paradata to Improve Survey Data Quality 2 |
|
Convenor | Dr Oliver Lipps (FORS, Lausanne) |
Coordinator 1 | Professor Volker Stocke (University of Kassel) |
Coordinator 2 | Professor Annelies Blom (University of Mannheim) |
"Paradata" are measures of the survey data collection process, such as data describing interviewer or respondent behaviour or data available from the sampling frame, such as administrative records. Examples of paradata are call-record data in CATI surveys, keystroke information from CAI, timestamp files, observations of interviewer behaviour or respondents' response latencies. These data can be used to enrich questionnaire responses or to provide information about the survey (non-)participation process. In many cases paradata are available at little additional cost. However, there is a lack of theoretically guided reasoning about how to use available paradata indicators to assess and improve the quality of survey data. Areas which might benefit from the effective use of paradata are:
- Paradata in fieldwork monitoring and nonresponse research: Survey practitioners can for example monitor fieldwork progress and interviewer performance (Japec 2005, Laflamme et al. 2008). They are also indispensable in responsive designs as real-time information about fieldwork and survey outcomes which affect costs and errors (Groves and Heeringa 2006). In methodological research into interviewer (Lipps 2008, Blom et al. 2011) or fieldwork (Lipps 2009) effects, consistent predictors of nonresponse and nonresponse bias (Blom et al. 2010), the jury is still out on the added value of paradata.
- Paradata to understand respondent behavior: Paradata might aid assessing of the quality of survey responses, e.g. by means of response latencies (Callegaro et al. 2009, Stocké 2004) or back-tracking (Stieger and Reips 2010). Research has used paradata to identify uncertainty in the answers given by respondents, e.g., if respondents frequently alter their answers, need a lot of time, or move the cursor over several answer options.
Papers in this session consider all aspects of measuring, preparing and analyzing paradata for data quality improvement in longitudinal as well as cross sectional surveys.
Survey topic as a factor influencing participation rates is becoming increasingly important, as there is an increasing trend in social science research to survey specific populations about specific topics. Previous research has shown that respondents with high topic interest (often referred to as salience) have a higher propensity to participate in surveys. However, the theoretical disentangling of topic interest has been largely neglected in the literature. We present an explanatory model of participation, based on subjective utility as a function of an actor's relational position in a particular social setting. To illustrate the relation of survey topic and participation behavior as a function of respondents' characteristics, we use an online survey on mating conducted on an online dating population (n= 3,457; response rate: 9%). Comprehensive information is available for both participants and non-participants on an individual level. Using this complete web-generated process data (n=34,565), consisting of profile and interaction data, we model respondents' probability of participation. Results of multivariate statistics show that the probability of participation varies by a user's chances of success on the mating market. Users who can be described as less attractive (e.g. older people, less educated men, overweight women) show a higher probability of participation, which we explain with the proposed topic salience mechanism. We conclude with general implications regarding (1) the relationship between survey topic and respondents' survey participation and (2) the potential of web-generated process data for (online) survey research.
Sequence analysis (SA) is a technique used to visualise and classify longitudinal or sequential data. Face-to-face household surveys generate a sequence of interviewer calls for each sampled address which can be analysed in this way. Kreuter and Kohler (2009) have applied this method to such paradata, motivated by the premise that information on the whole sequence of attempts to recruit a household may be predictive of both the probability of nonresponse and the survey variables of interest. In this context SA produces summary measures describing the series of calls made to a household during recruitment efforts. These resulting variables are available for respondents and nonrespondents alike and as such may be useful for nonresponse weighting.
The analysis presented here uses call records from The Irish LongituDinal Study of Ageing (TILDA), a nationally representative face-to-face household survey of the over fifties in the Republic of Ireland. Using SA, a typology of call sequences is generated based on the time and outcome of each call. We demonstrate that more interpretable sequences are produced when the cost settings associated with SA are adjusted to reflect what is known about the fieldwork process and to compensate for variation in the number of calls at each household. We then assess the impact of nonresponse adjustment using nonresponse weights generated from the sequence typologies compared to weights based on simple summary measures such as the number of calls.
With response rates decreasing in the past decades, the assessment of and adjustment for nonresponse bias has become evermore important. One method to study nonresponse bias is the use of enriched sampling frame data which allow to compare respondents and nonrespondents of a survey (Groves 2006, Smith/Kim 2009). In Germany private vendors like Microm offer microgeographic data mainly for the purpose of direct marketing. The data cover a range of topical areas, like e.g. neighborhood information, mobility, purchasing power, socio demographic and socio economic information, attitudes, consumer preferences and media use. The lowest level for which these information are available are aggregates of at least five households. In our presentation we investigate whether these information are useful for the study of survey participation and nonresponse bias. We match the data to the sample frame of the German General Social Survey (ALLBUS) 2010. We analyze - separately for contact and cooperation - whether the data help to predict participation in the ALLBUS 2010. In addition, we check whether the microgeographic variables are correlated with selected survey variables of interest. Only when the auxiliary variables are predictive both of the target persons probability of responding AND of the key survey variables they will be useful for postsurvey adjustments to compensate for nonresponse (Kreuter et al. 2010).
Sample survey studies have demonstrated that response propensity is explained by the extent to which one is socially integrated, namely the degree of attachment to the broader community (Abraham, Maitland and Bianchi, 2005), or one is socially isolated or disengaged (Groves and Couper, 1998; Chun, 2009). Extending the social integration framework to the U.S. Census, we identify paradata that may account for nonresponse in the U.S. decennial census. We attempt to isolate person-, household- and neighborhood-level paradata correlates of census nonresponse, respectively, by applying a social integration framework. We move on to model them simultaneously across levels to understand the impact on response propensity of a complete set of paradata correlates of census nonresponse. We use the nonresponse followup universe of the 2010 Census which is merged with paradata (e.g. person-level contact history) as well as household and neighborhood characteristic paradata borrowed from the 2010 Planning Database (e.g., household structure, neighborhood poverty measures). We discuss pragmatic merits and drawbacks of a theory-driven response propensity model to making the census nonresponse followup operations cost-effective and producing high quality data.
This paper studies the quality of paradata from interviewer burden perspective using interviewer and paradata from the most recent round (5) of the European Social Survey (ESS). Also, country information regarding fieldwork preparation and implementation collected by ESS national coordinators will be used. This paper firstly presents overview on interviewer variance primarily on key items (e.g. type of dwelling and neighbourhood, interviewer’s assessment on refuser’s future cooperation, reasons of refusal) and item-non-response in paradata. On the basis of interviewer burden conceptual framework (Japec 1998), this paper focuses on the determinants of quality in paradata in which analytical units are measured and modelled at the interviewer level. Since interviewers work in different country situation, comfort context scores are calculated illustrating different aspects of administrative and survey design factors in each country. Our previous analysis shows that data quality is heavily influenced by context specific issues including administrative and survey design factors and interviewer performances. Furthermore, it shows that determinants on data quality differ across items. Our analysis highlights importance of decreasing interviewer burden calling greater harmonization of fieldwork practices to provide structural support to interviewers. This means, for instance, type of sampling frame, types of interviewer training, interviewer remuneration status and interviewer workloads, should be harmonized as much as possible across countries. Further, interviewers working in less comfort zone, and more difficult clusters with high non-response and refusal conversion cases are prime candidates to receive additional remuneration and support. For this, we expect to identify important inputs related to fieldwork preparation (e.g. interviewer training and management), implementation (eg. contact strategies), interviewer working conditions (e.g. interviewer remuneration status) and interviewer performances (eg. Non-response and workload).