ESRA logo

Tuesday 16th July       Wednesday 17th July       Thursday 18th July       Friday 19th July      

Download the conference book

Download the program





Tuesday 16th July 2013, 14:00 - 15:30, Room: No. 14

Using Paradata to Improve Survey Data Quality 2

Convenor Dr Oliver Lipps (FORS, Lausanne)
Coordinator 1Professor Volker Stocke (University of Kassel)
Coordinator 2Professor Annelies Blom (University of Mannheim)

Session Details

"Paradata" are measures of the survey data collection process, such as data describing interviewer or respondent behaviour or data available from the sampling frame, such as administrative records. Examples of paradata are call-record data in CATI surveys, keystroke information from CAI, timestamp files, observations of interviewer behaviour or respondents' response latencies. These data can be used to enrich questionnaire responses or to provide information about the survey (non-)participation process. In many cases paradata are available at little additional cost. However, there is a lack of theoretically guided reasoning about how to use available paradata indicators to assess and improve the quality of survey data. Areas which might benefit from the effective use of paradata are:

- Paradata in fieldwork monitoring and nonresponse research: Survey practitioners can for example monitor fieldwork progress and interviewer performance (Japec 2005, Laflamme et al. 2008). They are also indispensable in responsive designs as real-time information about fieldwork and survey outcomes which affect costs and errors (Groves and Heeringa 2006). In methodological research into interviewer (Lipps 2008, Blom et al. 2011) or fieldwork (Lipps 2009) effects, consistent predictors of nonresponse and nonresponse bias (Blom et al. 2010), the jury is still out on the added value of paradata.

- Paradata to understand respondent behavior: Paradata might aid assessing of the quality of survey responses, e.g. by means of response latencies (Callegaro et al. 2009, Stocké 2004) or back-tracking (Stieger and Reips 2010). Research has used paradata to identify uncertainty in the answers given by respondents, e.g., if respondents frequently alter their answers, need a lot of time, or move the cursor over several answer options.

Papers in this session consider all aspects of measuring, preparing and analyzing paradata for data quality improvement in longitudinal as well as cross sectional surveys.


Paper Details

1. The Relation of Survey Topic and Participation Behavior. Analyzing Unit Nonresponse using web-generated Process Data

Ms Doreen Zillmann (University of Bamberg)
Mr Andreas Schmitz (University of Bamberg)
Professor Hans-peter Blossfeld (European University Institute)

Survey topic as a factor influencing participation rates is becoming increasingly important, as there is an increasing trend in social science research to survey specific populations about specific topics. Previous research has shown that respondents with high topic interest (often referred to as salience) have a higher propensity to participate in surveys. However, the theoretical disentangling of topic interest has been largely neglected in the literature. We present an explanatory model of participation, based on subjective utility as a function of an actor's relational position in a particular social setting. To illustrate the relation of survey topic and participation behavior as a function of respondents' characteristics, we use an online survey on mating conducted on an online dating population (n= 3,457; response rate: 9%). Comprehensive information is available for both participants and non-participants on an individual level. Using this complete web-generated process data (n=34,565), consisting of profile and interaction data, we model respondents' probability of participation. Results of multivariate statistics show that the probability of participation varies by a user's chances of success on the mating market. Users who can be described as less attractive (e.g. older people, less educated men, overweight women) show a higher probability of participation, which we explain with the proposed topic salience mechanism. We conclude with general implications regarding (1) the relationship between survey topic and respondents' survey participation and (2) the potential of web-generated process data for (online) survey research.


2. Sequence Analysis of Call Record Data: Are Sequence Typologies useful for Nonresponse Adjustment?

Mr Mark Hanly (University of Bristol)
Dr Paul Clarke (University of Bristol)
Professor Fiona Steele (University of Bristol)

Sequence analysis (SA) is a technique used to visualise and classify longitudinal or sequential data. Face-to-face household surveys generate a sequence of interviewer calls for each sampled address which can be analysed in this way. Kreuter and Kohler (2009) have applied this method to such paradata, motivated by the premise that information on the whole sequence of attempts to recruit a household may be predictive of both the probability of nonresponse and the survey variables of interest. In this context SA produces summary measures describing the series of calls made to a household during recruitment efforts. These resulting variables are available for respondents and nonrespondents alike and as such may be useful for nonresponse weighting.

The analysis presented here uses call records from The Irish LongituDinal Study of Ageing (TILDA), a nationally representative face-to-face household survey of the over fifties in the Republic of Ireland. Using SA, a typology of call sequences is generated based on the time and outcome of each call. We demonstrate that more interpretable sequences are produced when the cost settings associated with SA are adjusted to reflect what is known about the fieldwork process and to compensate for variation in the number of calls at each household. We then assess the impact of nonresponse adjustment using nonresponse weights generated from the sequence typologies compared to weights based on simple summary measures such as the number of calls.


3. Do microgeographic data provide useful auxiliary variables to assess and adjust for nonresponse bias?

Mr Michael Blohm (GESIS - Leibniz Institute for the Social Sciences)
Mr Achim Koch (GESIS - Leibniz Institute for the Social Sciences)

With response rates decreasing in the past decades, the assessment of and adjustment for nonresponse bias has become evermore important. One method to study nonresponse bias is the use of enriched sampling frame data which allow to compare respondents and nonrespondents of a survey (Groves 2006, Smith/Kim 2009). In Germany private vendors like Microm offer microgeographic data mainly for the purpose of direct marketing. The data cover a range of topical areas, like e.g. neighborhood information, mobility, purchasing power, socio demographic and socio economic information, attitudes, consumer preferences and media use. The lowest level for which these information are available are aggregates of at least five households. In our presentation we investigate whether these information are useful for the study of survey participation and nonresponse bias. We match the data to the sample frame of the German General Social Survey (ALLBUS) 2010. We analyze - separately for contact and cooperation - whether the data help to predict participation in the ALLBUS 2010. In addition, we check whether the microgeographic variables are correlated with selected survey variables of interest. Only when the auxiliary variables are predictive both of the target persons probability of responding AND of the key survey variables they will be useful for postsurvey adjustments to compensate for nonresponse (Kreuter et al. 2010).


4. Paradata and Its Application to Building Model-based Response Propensity in the U.S. Census

Dr Asaph Chun (US Census Bureau)

Sample survey studies have demonstrated that response propensity is explained by the extent to which one is socially integrated, namely the degree of attachment to the broader community (Abraham, Maitland and Bianchi, 2005), or one is socially isolated or disengaged (Groves and Couper, 1998; Chun, 2009). Extending the social integration framework to the U.S. Census, we identify paradata that may account for nonresponse in the U.S. decennial census. We attempt to isolate person-, household- and neighborhood-level paradata correlates of census nonresponse, respectively, by applying a social integration framework. We move on to model them simultaneously across levels to understand the impact on response propensity of a complete set of paradata correlates of census nonresponse. We use the nonresponse followup universe of the 2010 Census which is merged with paradata (e.g. person-level contact history) as well as household and neighborhood characteristic paradata borrowed from the 2010 Planning Database (e.g., household structure, neighborhood poverty measures). We discuss pragmatic merits and drawbacks of a theory-driven response propensity model to making the census nonresponse followup operations cost-effective and producing high quality data.


5. What comfort matters? Impact of interviewer behavior on quality of paradata from their burden perspective

Dr Hideko Matsuo (KULeuven)
Professor Geert Loosveldt (Departement of Sociology, K.U.Leuven)

This paper studies the quality of paradata from interviewer burden perspective using interviewer and paradata from the most recent round (5) of the European Social Survey (ESS). Also, country information regarding fieldwork preparation and implementation collected by ESS national coordinators will be used. This paper firstly presents overview on interviewer variance primarily on key items (e.g. type of dwelling and neighbourhood, interviewer’s assessment on refuser’s future cooperation, reasons of refusal) and item-non-response in paradata. On the basis of interviewer burden conceptual framework (Japec 1998), this paper focuses on the determinants of quality in paradata in which analytical units are measured and modelled at the interviewer level. Since interviewers work in different country situation, comfort context scores are calculated illustrating different aspects of administrative and survey design factors in each country. Our previous analysis shows that data quality is heavily influenced by context specific issues including administrative and survey design factors and interviewer performances. Furthermore, it shows that determinants on data quality differ across items. Our analysis highlights importance of decreasing interviewer burden calling greater harmonization of fieldwork practices to provide structural support to interviewers. This means, for instance, type of sampling frame, types of interviewer training, interviewer remuneration status and interviewer workloads, should be harmonized as much as possible across countries. Further, interviewers working in less comfort zone, and more difficult clusters with high non-response and refusal conversion cases are prime candidates to receive additional remuneration and support. For this, we expect to identify important inputs related to fieldwork preparation (e.g. interviewer training and management), implementation (eg. contact strategies), interviewer working conditions (e.g. interviewer remuneration status) and interviewer performances (eg. Non-response and workload).