ESRA 2025 Preliminary Program
All time references are in CEST
AI for Survey Construction: Innovations, Challenges, and Future Directions |
Session Organisers |
Mrs Sandra Jaworeck (Technical University of Chemnitz) Mr Carsten Schwemmer (Ludwig-Maximilians-University Munic)
|
Time | Tuesday 15 July, 11:00 - 12:30 |
Room |
Ruppert Blauw - 0.53 |
As technological advancements continue to revolutionize survey research, the emergence of AI tools like ChatGPT presents transformative opportunities for constructing and enhancing surveys. This session explores the integration of AI in survey research, particularly in developing, refining, and optimizing survey instruments. With AI's capabilities in natural language processing and machine learning, there is significant potential to automate and improve the survey design process, from question formulation to adaptive questionnaire pathways.
AI-powered tools offer innovative approaches to creating survey content that is contextually relevant and tailored to diverse populations. They can assist in generating high-quality, unbiased questions, providing translations, and predicting potential biases or ambiguities in wording, thus enhancing the validity and reliability of survey instruments. Additionally, AI can optimize surveys for mobile devices and different platforms, increasing accessibility and participation rates while ensuring data quality.
However, the integration of AI in survey construction is not without its challenges. Ethical considerations around data privacy, consent, and potential biases introduced by AI algorithms need to be carefully managed. Furthermore, the lack of transparency in AI-driven decision-making processes raises concerns about interpretability and trustworthiness. This session aims to delve into these critical issues, fostering a dialogue on how to balance innovation with methodological rigor and ethical responsibility.
This session will provide a comprehensive overview of the promises and pitfalls of using AI for survey construction. We invite contributions that showcase empirical research, theoretical developments, and practical applications of AI in survey design, as well as critical perspectives on the ethical, technical, and conceptual challenges involved. Together, we aim to explore the future directions of AI in survey research, promoting a robust and nuanced understanding of how these technologies can be leveraged to meet the evolving needs of the survey community.
Papers
Artificial Intelligence Meets Bibliometrics: A New Approach to Pretesting Surveys Among Researchers
Mrs Anastasiia Tcypina (German Centre for Higher Education Research and Science Studies) - Presenting Author
This study develops a novel approach for pretesting a new questionnaire, designed as part of a larger project in field-comparative research. The questionnaire captures the field-specific factors influencing data-sharing practices among researchers while ensuring the consistent collection of diverse perspectives across disciplines. Researchers working experimentally or observationally may differ in their perceptions of what constitutes data, how data are used, and the value of data sharing; however, they should interpret the questionnaire similarly. This heterogeneity requires accurate pretesting, typically through cognitive interviewing, to uncover ambiguities or misinterpretations.
To enhance pretesting, this study introduces artificial intelligence (AI) to simulate cognitive interviews with researchers. The AI does not undergo a training phase; instead, it uses bibliometric metadata and text-based data, such as publication abstracts, as inputs for designing tailored prompts. By mimicking researchers' perspectives, AI simulations aim to identify ambiguities and refine the questionnaire before human testing. The pretesting phase will involve approximately 20 AI simulations, followed by cognitive interviews with the researchers whose bibliometric metadata were used.
In this study, real-world interviews primarily validate the AI-generated findings. They assess how closely the AI-simulated responses align with researchers' actual perspectives. Specifically, the study examines whether AI simulations effectively highlight field-specific ambiguities and address interdisciplinary complexities. Real interviews uncover practical challenges or limitations of the AI approach that may not have been evident during simulations.
The findings aim to answer key questions about AI’s role in pretesting: Can AI replace or complement traditional pretest methods for surveying researchers, particularly regarding field-specific differences? How well does bibliometric metadata capture disciplinary perspectives in simulations? These insights will contribute to understanding the potential and limitations of using AI, informed by research texts, to improve survey pretesting.
Large Language Models in Social Research: A New Paradigm for Attitude Scale Construction
Ms Penelope Stamou (1National Technical University of Athens, School of Electrical and Computer Engineering) - Presenting Author
The rapid advancement of technology is revolutionising survey research, with AI tools like ChatGPT and other Large Language Models (LLMs) offering transformative opportunities for data collection and the development of sophisticated research instruments. This study builds on prior research to establish a foundation for leveraging LLMs in the construction of attitude scales, a crucial element of survey design in social and psychometric research.
A previous study by Symeonaki et al. (2024) explored the use of ChatGPT as a substitute for human judges in developing Thurstone scales, addressing a key limitation of this methodology—the difficulty of assembling the panel of human experts. However, this approach reveals inconsistencies in ChatGPT’s responses across multiple iterations, posing a challenge to reliability.
To overcome this limitation, the present study examined attitudes toward individuals living with HIV by leveraging a set of pre-designed questions-items and expanding the scope to include five distinct LLMs (e.g., ChatGPT, Claude, and Llama), each with unique characteristics to account for the variability in human experts. Each model underwent at least 10 trials and was instruction-tuned using a pre-constructed dataset to simulate the decision-making processes of human experts. Median assessments across the models were then systematically compared with the evaluations provided by 73 human judges, including undergraduate, postgraduate, and PhD students from disciplines such as social policy, law, medicine, and computer engineering, alongside experienced social researchers.
The findings highlight that LLMs can effectively address the primary drawback of Thurstone scale construction—the reliance on human judges—while demonstrating superior performance by evaluating items holistically rather than in isolation. This research not only confirms the viability of LLMs as reliable tools for attitude scale development but also underscores their potential to transform survey design and strengthen the methodological rigor of social science research.
Silicon Interviews: Advancing Semi-Structured Interview Methodology Through AI-AI Simulations
Ms Leonie Steinbrinker (Leipzig University) - Presenting Author
Dr Stephan Poppe (Leipzig University)
Mr Nicolas Ruth (Leipzig University)
Dr Andreas Niekler (Leipzig University)
Selecting an appropriate measurement instrument in social science survey research presents researchers with a fundamental trade-off between depth and scalability. The emergence of Large Language Models (LLMs), with their diverse applications and enhanced user-friendliness, opens new possibilities for social science research methodology. While recent research has primarily focused on whether LLMs can replace human respondents through so-called silicon samples, less attention has been paid to their potential to also replace interviewers. Early findings suggest that AI interviewers can surpass humans in specific tasks, such as active listening, while also reducing biases resulting from reactivity and social desirability effects, but their capabilities remain underexplored.
Our study takes a novel approach: simulating AI-AI interactions, where LLMs act as both interviewers and respondents in semi-structured interview settings. This dual-simulation framework enables us to rigorously test and refine interviewing methodologies while addressing the dynamic nature of human interactions, which are inherently unpredictable and context-dependent. As a test case, we simulate interviews with children—a particularly challenging context that demands empathetic and adaptive interviewing techniques. By adhering to established guidelines for interviewing children, we evaluate LLMs on their ability to generate dynamic follow-up questions, maintain conversational flow, and exhibit algorithmic fidelity in both roles, modeling complex human interactions.
Beyond addressing common methodological challenges, such as social desirability biases and subjective inconsistencies, our framework offers a scalable solution for testing and refining interview methodologies. Generating 'silicon interviews', enables the development and refinement of downstream tasks, such as computational coding and analysis, by providing more robust large-scale datasets. Our findings underscore the potential of AI-driven simulations to advance survey research by bridging methodological gaps, reducing costs, and enhancing the scalability of interview-based research, thereby easing the trade-off between large-scale surveys and semi-structured interviews.
Natural Language Processing Insights into Personality: Semantic Similarity at the Item Level
Ms Zoe Greer (GESIS - Leibniz Institute for the Social Sciences, Victoria University of Wellington) - Presenting Author
Professor Ronald Fischer (D’Or Institute for Research and Education)
Professor Markus Luczak-Roesch (Victoria University of Wellington)
We investigate the underlying semantic structure of personality traits from several popular big five and big six inventories. We use Natural Language Processing approaches to cluster semantically similar items to better understand personality structure and trait content. In Study 1 we examine which terms from classic trait-term dictionaries are more likely to be used in personality inventories. Overall, trait-terms from trait taxonomies are rarely used in popular personality inventories. The best match between trait dictionaries and trait items was observed for Agreeableness and Openness trait-terms, which appeared most frequently in the ‘correct’ items. Extraversion trait-dictionary terms were much more likely to appear in Neuroticism items than any other trait. Switching to more advanced methods, in Study 2 we provide benchmarks for interpreting semantic similarity scores for transformer models in the context of personality items and constructs. We examined the semantic similarity of the NEO-PI-R (McCrae & Costa, 2008) and Goldberg’s IPIP version which is supposed to mimic the NEO-PI structure and format. In Study 3 we use these semantic similarity thresholds to explore the semantic structure of five modern personality inventories’ items, using sentence-BERT based similarity indices. We did not find the expected loose semantic families of broad traits (see Saucier & Goldberg, 1996) at the item level, except for Conscientiousness. We also uncover a small number of cross-trait item pairs with extremely high semantic similarity, which could impact correlations with third variables (see Cooper, 2024) and construct clarity. Neuroticism items show relatively high similarity with Agreeableness and Extraversion items and often emerge in joint semantic clusters. This may indicate some genuine construct overlap. Inventory choice is therefore key for mental health related research. Our study adds to the growing evidence that broad traits cannot be dismissed as semantic artefacts.