ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Tuesday 18th July, 16:00 - 17:30 Room: Q4 ANF2


Innovations and Advanced Techniques for Question Testing and Evaluation

Chair Dr Ting Yan (Westat )
Coordinator 1Dr Aaron Maitland (Westat)

Session Details

Researchers and policy-makers reply heavily on survey questionnaires to collect data. Question testing and evaluation is an important step to reduce potential measurement error and to improve the quality of the data collected. There is a large variety of question testing and evaluation tools available. Some methods produce qualitative assessment of the survey items whereas other methods generate quantitative estimates of validity, reliability, and other indicators of survey items’ performance. Some are inexpensive and quick but others require data collection and are comparatively more costly. Furthermore, different evaluation methods could yield different conclusions on the performance of the same survey items. As new techniques are available, this session invites presentations that explore and showcase innovative uses of new or existing techniques and methods in question testing and evaluation. We particularly invite presentations employing (1) new techniques such as eye-tracking, (2) advanced statistical methods such as split-ballot Multitrait-Multimethod designs, (3) a combination of new and existing techniques (e.g., using QUAID, eye-tracking, and cognitive interviewing), and (4) innovations or advances in existing methods (such as new analyses to be done on cognitive interviewing data). We also invite presentations discussing question testing and evaluation in a cross-cultural context.

Paper Details

1. Choosing Question Testing Methods: A Framework for Decisions
Dr Roger Tourangeau (Westat)

There are a variety of methods available to researchers and practitioners to develop and evaluate survey questions. However, this diversity of methods can make it difficult for practitioners to decide which methods to use in specific situations. This paper begins with a brief overview of the major methods available for questionnaire development and testing, including different types of expert, laboratory, and field-based methods for evaluating survey questions. The focus of the paper is to present a framework for practitioners to make decisions about question evaluation methods. To achieve that goal, the paper first presents a set of factors to consider when making decisions about which methods to use. Many factors influence the choice of methods used for a particular project. For example, the budget and time available for questionnaire development and testing are often the major factors in determining which methods can be used. Expert methods (which require no data collection) are the least expensive and require the least time; field-based methods are the most costly and most time consuming. The methods also differ in the type of expertise required for implementation. For example, some methods require qualitative skills, whereas others are more quantitative in nature. We identify these decision factors based on best practices and the results from some recent studies that have compared how the results from the different methods are similar or different. Second, the paper presents a decision tree that can be used by practitioners that can help to identify appropriate methods to use for their question evaluation project.


2. A Comparison of Emerging Pretesting Methods for Evaluating “Modern” Surveys
Mrs Emily Geisen (RTI International)
Mr Joe Murphy (RTI International)

Due to low costs, improvements in coverage, and technological advances many surveys are now being conducted in whole or in part via self-administered web questionnaires. Increasingly, respondents are choosing to complete web surveys on touch-screen mobile devices such as tablets and smartphones. Recent estimates show that the proportion of respondents completing a survey on a mobile device can be 30% or more for some surveys (Lugtig, Toepoel, and Amin, 2016; Saunders, 2015). Mobile apps are also being used by survey respondents who are panel members and by interviewers to administer household screening surveys. Because of these technological advances, the ways that respondents and interviewers interact with surveys are changing.

With the pace of change in survey administration, we need to consider whether traditional pretesting methodologies address the types of potential quality concerns these newer modes introduce. For example, modern web surveys support dynamic survey features such as hover-over definitions, calculate total buttons, videos/images, error messages, dynamic look-ups, touch-screen, swiping to navigate, GPS, and other capabilities. Each of these features changes the respondent-survey interaction, which can affect the quality of the data collected in a survey.

The purpose of this paper is to introduce emerging survey pretesting methodologies and compare these with traditional methods in the light of modern data collection technologies to consider where the standard approaches for pretesting can be improved. We begin by discussing the key limitations of traditional pretesting methods such as expert review, cognitive interviewing, and pilot testing for evaluating “modern” surveys. We then provide an overview of emerging pretesting methods including usability testing, eye tracking, and crowdsourcing. We discuss the advantages offered by these methods – particularly in terms of budget and schedule—and provide empirical examples of how these methods can improve data quality. We conclude with a theoretical mode for the optimal combination of traditional and newer methods for pretesting modern surveys.


3. Comparative Analysis of the Quality Evaluation Algorythm of the Measuring Tool in English and Russian Languages Using Survey Quality Predictor
Miss Marina Vasilyeva (National Research University Higher School of Economics)
Miss Natalia Voronina (National Research University Higher School of Economics)

Sociologists permanently encounter a problem of decision making in a process of developing or choosing the existing relevant measuring instrument. These decisions affect the quality of data which is collected by applying selected method frequently enough. Therefore, scholars routinely have to deal with the question of the measurement quality, especially validity and reliability. There are methods for quality evaluation of a measuring tool that can be used prior the survey is conducted, the result of its application is a predictive score of the quality, which is based on an analysis of special features of an exact measuring tool (for instance, a question in a survey). Survey Quality Predictor (SQP) is acting as such predictive tool, which is applicable to 19 languages for now. The Russian language is not presented among them, whereas this language occurs widely and therefore a significant amount of studies and questionnaires are not available for the evaluation in the SQP. This gap is addressed by exploring several themes. First, we examine the theoretical background of the quality of measuring tools. This is the issue since there are several theoretical approaches to the understanding and even relation of the validity, reliability, their types and how they are tested. Second, we consider the Survey Quality Predictor in the context of this theoretical background to draw a clear distinction, what aspects of quality measures SQP covers. Third, we seek to verify if there are any content-related differences in special codings which are meant to be used for quality evaluation of survey questions in English and Russian languages in SQP. Finally, we address the coefficient differences of codings of Survey Quality Predictor for English and Russian languages.


4. Measuring Subjective Health and Life Satisfaction with U.S. Hispanics
Professor Sunghee Lee (University of Michigan)
Professor Rachel Davis (University of South Carolina)

Health and well-being are two important issues not only in research but also in policy. While accurate measurement of these attributes is critical, extant research indicates its difficulty in survey research. The main difficulties stem from the subjective nature of the concepts of health and well-being, as well as the use of response options with vague quantifiers. These difficulties become more evident for cross-cultural studies, where the concepts of health and well-being themselves may not be comparable. Moreover, their measurement instruments may not function equivalently.

This study focuses on the U.S. Hispanics and examines three variables: 1) self-rated health (SRH) and 2) life satisfaction (LS) measured with single item and 3) LS measured with five items. On SRH, Hispanics are known to report negative health more than non-Hispanic Whites, and non-equivalent translation of the English response categories of “excellent,” “very good,” “good,” “fair,” and “poor” has been hypothesized as a potential contributor. With the 5-item LS scale, all items are stated in a positive direction and asked with the Likert-type agreement response scale. As Hispanics are shown to be associated with acquiescent response style, the current 5-item LS scale may lead to an overestimation of LS for Hispanics. Motivated by these specific issues, we implemented the following experiments: 1) on SRH, translation of the response category, “fair,” into “regular” versus “passable”; and 2) on the five-item LS scale, direction of item wordings to be all positive versus balanced. Respondents were randomly assigned to one of the two conditions under each experiment. Using this experimental data, we will examine simple response distribution, item characteristics based on item response theory and relationships across three measures. These examinations will be carried out by considering interview language. The main data will come from a telephone survey of 1,296 U.S. Hispanics, supplemented by a Web survey a nonprobability sample of 1,416 Spanish speakers in the U.S. The web survey data was provided by SurveyMonkey, but all analysis will be done by the authors.


5. Oculomotor activities as potential indicators to the quality of survey instrument design
Dr Lin Wang (U.S. Census Bureau)

Saccadic eye movements (saccades), pupillary responses, and blinks are three oculomotor activities that reflect human’s cognitive processes. For example, saccades direct visual attention, pupillary responses are associated with cognitive load, and blinking suggests the level of sustained attention. A respondent’s survey completion can be conceptualized as a three-phase cognitive process that is executed in order: information acquisition (reading questions), information processing (comprehending questions and formulating responses), and motor action (making responses). This paper explores the methodologies for applying the oculomotor measures to characterizing respondents’ cognitive processes during survey completion.
Saccades are rapid rotation of both eyeballs in the same direction. The main function of saccades is to direct fovea vision to where one wants to see. Fovea vision enables the viewer to discern fine details, e.g., small prints. When responding to a question, the respondent uses saccades to read the questions, to search for response options, and to guide the hand in making a response. Three measures are indicative of respondent’s cognitive effort: fixation duration (the time for which the eyeballs stay at the same spot), perceptual span (the spatial extent to which visual information can be acquired without eye movements), and scan regression (the repetitive pattern of saccades over a particular text). These three measurements are influenced by physiological, psychological, and environmental factors. An optimally designed survey instrument will be reflected in short fixation duration, greater perceptual span, and less regression.
The pupil is a circular opening located on the eyeball surface. Through the pupil, outside light can reach the retina in the back of the eye. The size of the pupil changes all the time in response to a variety of factors, including light intensity. Cognitive load is another important contributor to pupillary responses. Studies have shown that pupil diameter increases with the increase of cognitive load. The primary measure of the association between pupillary response and cognitive load is the change in pupil diameter. Pupillary response can be used to study respondent’s cognitive load during survey completion. Due to the fact that the pupil is sensitive to various physiological, psychological, and environmental factors, it is crucial to control as many confounding factors as possible during the experiment.
Blinking is a rapid closing of the eyelid. The essential function of blinking is to spread tears and lubricate the eye. in addition, blinking is also associated with the level of sustained attention. Studies have shown that in a prolonged task performance, performance degradation is correlated with increase in blinks. In characterizing sustained attention, blink rate is the primary measure. During reading, blinking usually occurs at the completion of a sentence or at a pause. The timing and frequency of blinking can be used to characterize respondent’s sustained attention level.
In summary, the three oculomotor measures can be potentially used to characterize respondents’ cognitive processes if a study is appropriately designed so that all the possible confounding factors are controlled. Thus, a methodologically sound experimental design is the key.