ESRA logo

ESRA 2023 Glance Program


All time references are in CEST

Factorial Surveys – Methods and Applications

Session Organisers Dr Hermann Dülmer (University of Cologne)
Professor Stefanie Eifler (Catholic University of Eichstätt-Ingolstadt)
TimeTuesday 18 July, 09:00 - 10:30
Room

Since vignette-designs (e.g., factorial surveys; scenario techniques) as indirect measurement techniques are very common in the social sciences by now, many different applications can be found. Depending on theoretical and methodological objectives, the applied techniques vary in a broad range and lead to different and sometimes inconsistent results. Due to this diversity, findings on methodological and substantial issues can have different meanings and impacts for further research. This session chooses one way of anticipating the diverse field of factorial surveys and vignette-designs in general, and aims at shedding light on the stage of affairs by discussing recent developments and pooling new findings of projects that try to enrich the discussion. The focus of the session is explicitly broad and all contributions dealing with different analytical strategies, empirical designs or substantial research that make use of factorial surveys or other vignette-designs are welcome. Papers matching one of the following aspects are cordially invited to be part of this session:
• comparison and discussion of design-related questions regarding methodological or substantive aspects,
• new developments in measuring intentions with vignettes,
• theoretical ideas for modelling the relationship between intentions and behavior for further empirical analyses,
• cross validation strategies (new approaches, replications),
• discussion of (dis-)advantages of vignette-designs, validations strategies and/or measures,
• issues of data-collection,
• substantive applications of factorial surveys

Keywords: Factorial Survey, Vignettes

Papers

Prioritizing COVID-19 Patients in Triage: Insights from a Factorial Survey Experiment

Professor Edurne Bartolomé-Peral (University of Deusto)
Dr Hermann Dülmer (University of Cologne)
Dr Pascal Siegers (GESIS Leibniz-Institute for Social Sciences) - Presenting Author
Dr Tilo Beckers (University of Düsseldorf)

In 2020, during the COVID-19 pandemic, a number of hospitals encountered situations where the demand for limited resources, such as ventilators, exceeded the supply. In this context, medical triage became the subject of public debate. Expert commissions were established to develop ethical guidelines for prioritizing medical assistance in the event of insufficient resources (triage). The principles underlying such guidelines, however, may differ from the moral intuitions of lay people. The aim of this study is to investigate, for the first time, which factors help justify prioritizing a COVID-19 patient for treatment with a ventilator in the general population. For this purpose, a factorial survey experiment was conducted in Spain in 2022. The results of the online survey showed that one important factor in the prioritising judgements was whether patients were vaccinated and another was whether they were smokers (the principle of responsibility), which contradicts ethical guidelines and shows the need for better communication between experts and the public. Other important factors in the prioritization of COVID-19 patients were their family obligations (the principle of need), whether they were born and raised in Spain, age, and social class. Our study also shows that the impact of some of these factors depends on respondents' personal values, whether they consider vaccination to be compulsory and whether they smoke. Individuals' values and circumstances influence how they evaluate patient characteristics when making triage decisions. This article shows how social science research in medicine benefits from experimental factorial survey designs to gain a better understanding of the micro-contextual factors that influence moral judgements and medical decisions in the face of scarce resources.


Integrating Correspondence Experiments and Factorial Survey Experiments to Study Labour Market Discrimination

Professor Giovanni Busetta (University of Messina)
Professor Maria Gabriella Campolo (University of Messina) - Presenting Author
Dr Giovanni Maria Ficarra (University of Messina)
Professor Alessandra Trimarchi (University of Messina)

Hiring discrimination remains a pervasive issue in the European labour market, disproportionately affecting ethnic minorities and women, and contributing to inefficiencies in workforce allocation. This study examines two foundational methodologies—Correspondence Experiments (CEs) and Factorial Survey Experiments (FSEs)—to measure and understand this phenomenon. While CEs utilize fictitious resumes sent to real job postings to detect discrimination, FSEs consist in inquiring employers about fictitious profiles in hypothetical hiring scenarios, allowing researchers to examine decision-making processes in controlled settings.
This study aims at highlighting the strengths of these methods to combine them. CEs offer high external validity by capturing real-world discrimination patterns but are limited in their ability to identify underlying biases, such as taste-based versus statistical discrimination. Conversely, FSEs provide deeper insights into the mechanisms driving discriminatory behaviour, albeit at the cost of reduced realism due to their hypothetical nature and potential social desirability bias.
To address these limitations, we propose a longitudinal mixed-method approach. First, a CE is conducted to establish baseline evidence of discrimination across real job markets. Subsequently, a vignette-based survey is administered to the same employers, designed to explore the cognitive and contextual factors influencing hiring decisions. This two-stage design bridges the information gaps inherent in isolated methodologies, enabling researchers to identify the prevalence, sources, and types of discrimination with greater precision.
Our findings underscore the value of integrating these approaches to generate a comprehensive understanding of hiring discrimination. By leveraging the complementary strengths of CEs and FSEs, this study advances methodological innovation in the field of labour market research and contributes to ongoing efforts to reduce discrimination and inequality in Europe.


AI is Held Morally Responsible for Detrimental Outcomes - But only if it isn’t Trustworthy

Dr Patrick Schenk (University of Lucerne) - Presenting Author
Ms Vanessa A. Müller (University of Lucerne)
Mr Lukas Posselt (University of Lucerne)

Imagine an artificial intelligence (AI) diagnosing a patient. Would you hold AI morally responsible for a mistake leading to a patient’s death? Although philosophers deny this question, research has found that laypeople actually do (Bonnefon et al. 2024; Abend/Posselt/Schenk forthcoming). With autonomous AI producing consequences beyond programmable control, responsibility becomes perplexing. People blame technological systems instead of developers or users (Kneer/Christen 2024). This leads to responsibility gaps. Yet, we know little of the exact conditions under which these gaps emerge.
Using a factorial survey experiment (FSE), we test two hypotheses. First, a violation of normative expectations should result in higher attribution of moral responsibility to AI (H1). People expect medical diagnoses to be correct, for instance. If a diagnosis turns out to be incorrect, people attribute more responsibility to agents producing detrimental outcomes (Knobe/Hitchcock 2009). Yet, this effect depends on an agent’s trustworthiness (Alicke et al. 2011). For trustworthy AI, people are motivated to externalize and therefore discount responsibility (H2).
FSEs are especially suited to test these hypotheses. Unlike simple survey items, vignettes provide situational context. In our vignettes, we vary agent type (AI vs. human), the task (eg., medical diagnosis), normative violations (eg., a mistaken diagnosis), among others (Schenk/Müller/Keiser 2024). Respondents rated the agent’s moral responsibility and trustworthiness after each vignette. We used dual mode administration with a stratified random sample of the Swiss population (n=2703) – in contrast to online samples common to this research area.
Consistent with the hypotheses, norm violation (ie., a mistaken diagnosis) leads to increased attribution of moral responsibility (H1) – but only if AI is not trustworthy (H2). Conversely, highly trustworthy agents are shielded from responsibility claims. These findings advance psychological and sociological theories of responsibility attribution and have timely implications for AI policy.


The Social Foundations of Political Hostility: Disentangling Political and Social Identities through a Factorial Survey Experiment

Ms Nelly Buntfuß (University of Technology Chemnitz) - Presenting Author

Increased hostility between supporters of different parties is often attributed to a rising importance of political identities, but it may be more deeply rooted in social structures. This phenomenon - referred to as social sorting - is increasingly being discussed as a driver of political polarization. This study seeks to expand our understanding of the extent to which negative affect is actually political and to which extent this political hostility has a socio-structural underpinning. In order to discriminate between the relative effects of different political and, in reality, often correlated social characteristics, I conducted a factorial survey experiment among the German resident population (N = 1200) in which party affiliation, issue positions, social class, gender, and region were randomised. By showing respondents more or less “sorted” or stereotypical profiles, this study examines whether profiles with fewer cross-cutting attributes provoke greater negative affect and, in turn, create social distance. The findings will deepen our understanding of how political and social identities interact to shape affective polarization.


The determinants of period pain presenteeism: A factorial survey

Ms Mella Perleberg (Federal Institute for Occupational Health and Safety (BAuA), University of Cologne) - Presenting Author
Mr Martin Kroczek (Institute for Applied Economic Research (IAW))
Dr Philipp Kugler (Institute for Applied Economic Research (IAW))

Sickness presenteeism, that is working while feeling ill, is mostly associated with negative consequences for both individuals (e.g., risk of long-term sickness) and organizations (e.g., productivity loss). Even though prevalence rates of presenteeism vary immensely - research shows that sickness presenteeism is a common practice among workers. One related, but highly understudied phenomenon is period pain presenteeism, i.e., working when experiencing period pain. Previous research suggests that a majority of women experience period pain, however, little is known about how women navigate this in their work life. The missing empirical evidence might be due to the prevailing social stigma surrounding period pain, as many women feel that staying away from work when feeling unwell due to period pain is unacceptable.
To further this research and provide adequate organizational policy implication, we aim to understand the decision-making process of period pain presenteeism by using data from a factorial survey representative for the female working population in Germany. The survey is embedded in the BIBB/BAuA Employment Survey 2024, which provides a unique opportunity to not only study relevant dimensions in the factorial survey but also draw on data provided by the main survey, covering a variety of topics such as personal characteristics of respondents. By presenting respondents different scenarios the factorial survey allows to draw causal conclusions about the determinants of period pain presenteeism. Conducting interaction analyses will allow to also determine what other factors might promote or weaken the effect of a determinant (e.g., pain severity). Moreover, we will shed more on the decision of whether to attend work on-site, call in sick or work from home. Doing so will also advance insights into virtual presenteeism (i.e., working from home when feeling unwell). The results present important policy implications for organizations and employers.


A Systematic Review of Gender and Ethnic discrimination in Hiring: Evidence from Factorial Survey Experiments

Professor Giovanni Busetta (University of Messina) - Presenting Author
Professor Maria Gabriella Campolo (University of Messina)
Dr Giovanni Maria Ficarra (University of Messina)
Professor Alessandra Trimarchi (University of Messina)

Hiring discrimination remains a significant obstacle to achieving equitable labour markets across Europe. This issue is particularly pressing as economies face labour shortages at both ends of the skill spectrum, highlighting inefficiencies in labour utilisation and the underrepresentation of women and ethnic minorities in the workforce. Addressing these challenges requires a deeper understanding of the mechanisms underlying hiring discrimination.
This study systematically reviews factorial survey experiments (FSEs) conducted between 2010 and 2024 to analyse gender and ethnic discrimination in hiring. These experiments provide a unique methodological approach by simulating realistic hiring scenarios, enabling researchers to explore biases in a controlled yet flexible manner.
Our review employed a modified Population, Intervention, Comparison, Outcome (PICO) framework to refine research questions and establish clear inclusion and exclusion criteria. Following PRISMA guidelines, we selected 21 FSEs studies, focusing on various European contexts. These studies examined key hiring outcomes such as the likelihood of being hired or invited for an interview, with emphasis on the intersection of gender and ethnicity dimensions. Additionally, two studies innovatively employedexpected wages as proxies for productivity, offering further insights into discriminatory patterns.
Findings reveal consistent disparities. Ethnic minorities and women face significantly lower hiring probabilities compared to majority groups. When gender and ethnicity intersect, compounded effects create substantial barriers to employment. Methodologically, FSEs prove effective in capturing these complex interactions, reinforcing their value in labour market research.
This review not only sheds light on the compounded nature of hiring discrimination but also identifies areas for methodological improvement. It recommends expanding the respondent base to include more diverse groups and assessing discrimination throughout the entire hiring process, including interviews and wage negotiations. By emphasising intersectionality and innovative approaches, this research aims to inform evidence-based policies that foster equity and inclusivity in European labour markets.


Do Apolitical Similarities Drive Political Engagement? Insights from Multi-Country Dynamic Parallel Conjoint Survey Experiment

Dr Gaetano Scaduto (University of Milan Bicocca) - Presenting Author
Professor Fedra Negri (University of MilanBicocca)
Dr Silvia Decadri (University of Milan Bicocca)

Political conversation scholars are divided between those suggesting people use apolitical cues, such as fashion, cars, and food preferences, to infer other people’s politics and subsequently decide whether to interact with them (Carlson & Settle, 2022; MacKuen, 1990) and those claiming that similarity in apolitical preferences sparks social interactions across political lines (Balietti et al., 2021; Minozzi et al., 2020). Are political inferences from apolitical cues relevant in the decision to engage in political conversation with others, or do people engage based on apolitical similarities, regardless of the expected political positions of others engendered by these? To investigate this question, we build upon the parallel design conjoint experiment (Acharya et al., 2018), to field an innovative survey experimental design for the study of inferential mediations fielded in four countries and languages (Italy, France, Czech Republic, Sweden, N=6000). We collect respondents’ sociodemographics (gender, age, geographical region, education level), psychological traits (conscientiousness, openness to new experiences), and lifestyle preferences (pets, food, means of transportation), and we dynamically generate conjoint profiles that are more-likely-than-chance to be similar to the respondents. Participants are divided into two groups: one group sees only the apolitical traits of each profile, while the other also sees the political ideology of each profile. Participants are asked to decide who they would rather discuss politics with between two profiles. Through this design, we decompose the Average Treatment Effect of observing each attribute similarity between the respondent and profile on the willingness to discuss politics into the Average Controlled Direct Effect and the Eliminated Effect, with the latter interpretable as the portion of the effect mediated by political inferences. Results show that apolitical similarities do not substantially lead to political expectations in all four countries and that interactions are rather motivated directly by apolitical similarities.


The Causal Effects of Poor and Corrupt Welfare State Service Delivery on Political Solidarity and Political Trust: Experimental Evidence from a Novel Virtual-State Approach

Professor Achim Goerres (Universität Duisburg-Essen)
Dr Philipp Chapkovski (Universität Duisburg-Essen)
Mr Jakob Eicheler (Universität Duisburg-Essen)
Mr Philipp Kemper (Universität Duisburg-Essen) - Presenting Author

We examine the effects of a poorly performing welfare state on political solidarity, using an innovative “fat vignette” environment embedded in a virtual online state named Novaland. Participants are immersed in a virtual state context in which they experience different levels of state service quality — high, low and low + corrupt — through a series of vignettes. In between these vignettes, participants engage in everyday activities within the virtual state of Novaland, such as choosing between restaurants or park and adopting a dog or cat, allowing for a holistic immersion in the virtual state, higher attention and retention. We test preregistered hypotheses of a social contract model between citizens and the welfare state. Our results show: (1) participants increased their willingness to pay for the welfare state when confronted with bad service delivery but significantly less so when confronted with a bad delivery in which they could bribe their way to a better service. (2) across different service experiences, participants showed lower levels of political trust for both low-quality and lowquality cum on-the-spot corruption experiences compared to high-quality experiences. This study underlines the crucial importance of effective social policies in maintaining political solidarity. In the post-COVID-19 period, many European states have struggled with a deterioration in service quality, making our findings particularly relevant. If welfare states cannot meet citizens expectations, this will lead to lower levels of political solidarity and trust.


More Levels, More Impact? Design Effects in Multifactorial Survey Experiments

Dr Fabian Thiel (Universität Konstanz) - Presenting Author
Dr Sabine Düval (Deutsches Jugendinstitut München)
Professor Katrin Auspurg (LMU München)

Multifactorial survey experiments are increasingly employed to examine attitudes, preferences, behavioral intentions, and normative beliefs. However, methodological challenges, such as the potential “number of levels effect,” may compromise the validity of estimated causal effects. This effect arises when experimental factors with more intermediate levels (e.g., price levels) become overly salient to respondents, leading to an inflated weight in their evaluations. Addressing this concern, we present findings from an experiment exploring the impact of three design features on response quality and factor weighting: (1) the number of levels for experimental factors (i.e. vignette dimensions), (2) the order of dimensions, and (3) the overall presentation style (e.g., tabular versus text presentation).

Our study employed a between-respondent experimental design, crossing these three features, with data collected from a random population sample of approximately 1,300 survey participants. Contrary to concerns in the literature, our findings reveal no substantial effects of these design features on response quality—measured by item nonresponse and satisficing behavior—or on the weighting of experimental factors. This pattern holds true even for various combinations of design features (i.e., number of levels x dimension order x presentation style) and respondent subgroups that might be more susceptible to inattentive responses or cognitive overload.

These results suggest that the “number of levels effect” is likely of limited practical significance in multifactorial survey experiments, at least in the context of our substantive application investigating public support for road pricing. Our findings contribute to the broader methodological discussion by providing evidence that such design features do not compromise the validity of treatment effect estimates, thereby reinforcing the robustness of factorial survey designs for empirical research.


Evaluating the Generalizability of Factorial Survey Experiments: A Comparison of Convenience and Real-World Samples

Mr David Strauß (University of Applied Sciences BFI Vienna) - Presenting Author

Factorial survey experiments (FSE) frequently rely on convenience samples consisting of various types of "labor rats" for pragmatic reasons. The indirect method employed in FSEs is renowned for achieving high internal and external validity as well as robust reliability. It is often assumed that "labor rats," compared to "real-world samples," provide more consistent responses due to their familiarity with survey scenarios. Building on this assumption, it can be hypothesized that FSE results derived from "labor rats" exhibit higher between-group R² values and, consequently, better generalizability compared to results from real-world samples.
To test this hypothesis, an FSE was conducted focusing on the degrees of freedom and participation in hybrid work environments and their impact on employee motivation. The experiment utilized both a convenience sample and a real-world sample. A 3x3x2x2x3x3 factorial design was employed, generating 324 vignettes organized into 54 decks of six vignettes each. The convenience sample, recruited via a panel provider, comprised n=1,581 participants. The real-world sample, drawn from three different companies, included n=94 respondents.
A GLS multilevel analysis revealed comparable within-group R² values between the two samples. However, the between-group R² values were negligible in the convenience sample, while substantial effects were observed in the real-world sample. These findings suggest that while "labor rats" are accustomed to survey tasks, this does not necessarily result in more consistent responses. The comparable within-group R² indicates that the vignette design of the FSE is effective, with reliability and internal validity considered high.
Nevertheless, the initial hypothesis must be rejected: The generalizability of results—specifically, the effects at the respondent level—should always be critically evaluated in FSE. Despite this, convenience samples remain a viable data source for FSEs, offering a practical foundation for exploratory research.