All time references are in CEST
Failures in Survey Experiments 1 |
|
Session Organisers | Dr Kristin Kelley (WZB Berlin Social Science Center) Professor Lena Hipp (WZB Berlin Social Science Center/University of Potsdam) |
Time | Wednesday 19 July, 14:00 - 15:00 |
Room | U6-01a |
What can failures and unexpected results tell us about survey experimental methodology?
In recent years, social scientists have increasingly relied on survey experiments to estimate causal effects. As with any experimental design, survey experiments can fail or yield results that researchers did not anticipate or preregister. Researchers find themselves in situations with null, unexpected, inconsistent, or inconclusive results and must then decide whether results reflect on the theory being tested, the experimental design, or both. Usually, the insights gained from such failures are not widely shared, even though they could be very useful to improve the design of experiments, quality of research, and transparency.
We propose a session in which scholars present insights from survey experiments that failed or led to unexpected results. We believe that sharing the design and results from failed survey experiments, carefully considering their possible flaws, and talking about unexpected findings is useful to the development of theory (e.g., identifying scope conditions) and methods, and contributions to the transparent research practices.
We invite contributions that address the following: What’s the appropriate response to a “failed” experiment or to deal with unexpected results and null findings? More specifically, the objective of the session will be to reflect on the definition of a “failed survey experiment” (e.g., null findings, unexpected findings, problems during the conduct of the survey experiments, findings that contradict field-experimental evidence and theoretical/pre-registered predictions). We will discuss why some experiments fail (e.g., poor treatments/manipulations, unreliable/invalid dependent measurements, underdeveloped theory and/or hypotheses, underpowered), whether and how to interpret and publish results from failed survey experiments, what was learned from failed survey experiments, and recommendations for survey experiment methodology.
Keywords: survey experiments, null-findings, pre-registration, experimental design, sample size requirements
Dr Kristin Kelley (WZB Berlin Social Science Center) - Presenting Author
Researchers who conduct survey experiments often get critiqued for measuring attitudes about scenarios that are un-realistic. If scholars focus on external validity, however, they may lose internal validity. How should researchers balance internal and external validity when conducting survey experiments? Relying on a recent survey experiment, I discuss these issues. I tested whether providing external (vs. internal) reasons for women’s marital name choices changed the effect of name choices on perceptions of women’s commitment. In designing the study, I prioritized internal validity over external validity. It is uncommon for name-changing women to attribute their choice to external forces, and more unusual for name-changing and name-keeping women to reference the same external forces. Yet, in the vignettes, I ensured name-keeping and name-changing women provided the same internal (“it was personally important to her that she changes/keeps her last name”) or external (women’s “work colleagues suggested she change/keep her name to make things easier”) reason. However, the external rationale was less externally valid than the internal rationale, which may have weakened the manipulation and the test of attribution theory. First, open-ended comments indicated respondents thought women who were influenced by their colleagues were weak and unlikable. Thus, women’s character was confounded with their rationale for their name choice. Second, although the rationales signaled external and internal motivations as I had intended, name choice also affected perceptions of whether women’s choice was due to internal or external reasons. Thus, the rationale for the choice is confounded with the choice itself. Third, I emphasized in all vignettes that women had a professional job. However, attributions may not matter much among women who have professional jobs.
Mrs Ekaterina Nastina (Higher School of Economics, Moscow) - Presenting Author
Numerous studies show that prosocial behavior leads to positive emotional outcomes for the actor. In our study using event-recall (N = 127) and interventional (N = 305) designs in Russian samples conducted in 2021 we tried to replicate this finding and elaborate on it by addressing the relationship closeness to the beneficiary. However, in neither of the two we originally observed significant differences between the control and prosocial conditions despite using a wide range of subjective well-being indicators. In this report, we will address the exploratory analyses using additional variables and respondents’ comments which helped us to make sense of this result. Additionally, we will reflect on the limitations of our and earlier recall and intervention studies and bring into the picture a number of more recent and robust tests of the phenomenon in the field. Design ideas to explore the link between prosociality and subjective well-being will be proposed.
Dr Katrin Drasch (FAU Erlangen-Nürnberg) - Presenting Author
This presentation illustrates the lessons learned from a pilot study of a factorial survey that examines the judgements of a snowball sampling population concerning when and under what circumstances an older person should be placed in a retirement home. A factorial online survey (resolution IV-design; D-efficiency 93,9) was constructed that varies individual characteristics (e.g., age, gender, family characteristics) and other personal circumstances such as social embeddedness. The mechanisms addressing those characteristics were drawn from the theory of successful ageing (Steverink et al. 1998) which is a version of the social production function theory (SPF) (Ormel et al. 1999) also covering an age gradient. Using a split-half design the situation was framed as either a high-cost or low-cost situation (Best and Kroneberg 2012). What was not varied were characteristics of the retirement home, but it was held constant at an average quality. Also, it was only said that the old person himself/herself is indifferent about the decision.
While the results (individuals=199; n=1,392) show that the theoretical mechanisms generally hold the overall explanatory power of the model is low (rho 0.060). Also, no differences between the high and low-cost situation are found. Parts of the results can be attributed to the selective snowball sampling strategy used for the pilot study (students, Facebook) and a low case number. Other possible sources of error are: first, the judgement situation is too fictious and demanding; second, the decision to utilize institutionalized care at a stage when it is not necessary (e.g., frail elderly or seniors with severe dementia) is influenced by social norms about care in old age that can only partially capture in our study. However, we provide suggestions for an adapted factorial survey design that overcomes these shortcomings.
Dr Leah Rosenzweig (Development Innovation Lab, University of Chicago) - Presenting Author
Dr Molly Offer-Westort (Department of Political Science, University of Chicago)
During a global pandemic, how can we best prompt social media users to
demonstrate discernment in sharing information online? We ran a contextual
adaptive experiment on Facebook Messenger with users in Kenya and Nigeria
and tested 40 combinations of interventions aimed at decreasing intentions
to share misinformation while maintaining intentions to share factual posts
related to COVID-19. We estimate precise null effects of showing users
warning flags or suggesting related articles alongside misleading posts, tactics
used by social media platforms. Instead, users share more discerningly when
they are given tips for spotting misinformation or are nudged to consider
information’s accuracy, reducing misinformation sharing by 7.5% and 4.5%
relative to control, respectively. We find significant heterogeneity in response
to these treatments across users, indicating tips and the accuracy nudge affect
outcomes through separate mechanisms. These low-cost, scalable interventions
have the potential to improve the quality of information circulating online.