ESRA 2023 Glance Program

All time references are in CEST

Coding and Analyzing Open-Ended Questions in Surveys
Session Organiser	Dr Alice Barth (University of Bonn)
Time	Tuesday 18 July, 09:00 - 10:30
Room

Open-ended questions, where respondents answer in their own words instead of choosing from a range of categories, can provide information on respondents’ subjective perspectives, their interpretation and understanding of concepts, or their reasons for choosing a specific response. There are a number of methodological challenges (and opportunities!) in analyzing unstructured text data, from data preparation, cleaning and coding to text mining, statistical modeling, and visualizing results.
This session aims at discussing methods for processing and analyzing responses to open-ended questions. Topics of particular interest are, for example (but not limited to)
- Using AI in coding and analyzing responses to open-ended questions
- Pre-processing unstructured text data
- Assessing the quality of responses to open-ended questions
- Statistical analysis techniques for unstructured text data (topic models, geometric data analysis, etc.)
- Complementing or contradicting results from standardized questions by using information from open-ended questions
- Using open-ended questions as a tool in survey methodology (e.g., web probing, respondent feedback on the survey process)

We are looking forward to contributions that highlight the methodological and/or substantial potential of open-ended questions in survey research.

Keywords: open-ended questions; natural language processing; text mining; text data; classification; geometric data analysis;

Papers

What Makes a Good Citizen for You? Evidence from an Open-Ended Survey in Germany

Mr Alexander Seitz (GESIS Leibniz Institute for the Social Sciences) - Presenting Author
Professor Kathrin Ackermann (University of Siegen)

Adherence to norms of good citizenship is considered an important factor in an individual’s motivation to actively participate in a democracy and to contribute to the stability of democratic systems. However, what exactly civic norms are made up of is a debated question. The dominant understanding follows research by Russel J. Dalton, who differentiated four categories of citizenship (participation, autonomy, solidarity and social order) and described two different types of norms of good citizenship (engaged citizenship norms and duty-based citizenship norms). However, research into the distribution of such norms among the populations of modern mass democracies has almost always followed a strongly prespecified, and thus potentially biased, approach to their measurement, using closed-ended survey instruments which may not even include the most salient norms held by citizens. In this paper, we explore an alternative way of measuring citizenship norms, using open-ended questions to better capture respondents’ authentic thoughts. Applying Latent Dirichlet Allocation (LDA) to classify responses from a 2021 web survey of German residents, we find that such an approach yields results which are generally consistent with conventional closed-ended instruments. However, duty-based and engaged citizenship, while observable, do not represent the most widespread types of citizenship norms. Further, their distribution among the population does not follow Dalton’s proposed pattern of generational displacement.

Don’t use these non-proprietary LLMs for inductive coding of qualitative survey data (yet)

Mr Urs Alexander Fichtner (Institute of Medical Biometry and Statistics) - Presenting Author
Mr Jochen Knaus (Weizenbaum-Institut)
Dr Erika Graf (Institute of Medical Biometry and Statistics)
Dr Jörg Sahlmann (Institute of Medical Biometry and Statistics)
Mr Dominikus Stelzer (Institute of Medical Biometry and Statistics)
Professor Martin Wolkewitz (Institute of Medical Biometry and Statistics)
Professor Harald Binder (Institute of Medical Biometry and Statistics)
Dr Susanne Weber (Institute of Medical Biometry and Statistics)

Background:
Within the EXPOLS project, we surveyed 465 employees of the Medical Center / University of Freiburg in Summer 2024. Using a web-based online survey, we aimed to explore experiences, attitudes, needs and perspectives towards artificial intelligence (AI) among employees who identify themselves as being related to science. Within the survey, we used standardized closed questions and open questions. Openly asked topics were chances and risks towards the use of Artificial Intelligence in the academic, teaching, and clinical working context. This contribution aims to explore the applicability of non-proprietary Large Language Models (LLMs) for inductive coding of open-ended survey questions.
Methods:
The text statements were inductively coded using three different LLMs: Qwen2.5-14B-Instruct, Meta-Llama-3.1-8B-Instruct, SauerkrautLM-Nemo-12b-Instruct. To test consistency of coding, we reiterated the LLM codings ten times. We advised the LLM to freely create categories and to count how often the code appeared in the text. We compared the results with manually coded categories coded manually by a researcher.
Results:
All three used LLMs showed weak performance for inductive qualitative data coding in several respects. First, compared to the manual coding, the LLMs coded up to 83 % less text segments than the researcher, which indicates a large amount of information loss. Second, identifying code segments as relevant seemed to depend on the frequency of the code segment appearing in the text. Third, in terms of consistency of the coding, the LLMs vary highly between the iterations. Fourth, the LLMs seem to be limited in the number of categories being produced, since there was no iteration producing more than ten categories.
Implications:
Within this use case, we cannot recommend to use these non-proprietary LLMs for inductive coding of open-ended survey questions. Prompt optimization needs to be discussed.

Fine-Tuning Large Language Models for Automatic Coding of Open-Ended Questions

Dr Xinyu Zhang (UCLA) - Presenting Author
Mr Julian Aviles (UCLA)
Dr YuChing Yang (UCLA)
Mr Todd Hughes (UCLA)
Dr Ninez Ponce (UCLA)

Open-ended questions allow survey respondents to provide detailed and unstructured answers, offering insights beyond predefined response categories. However, the survey estimates can be drastically affected by processing errors in coding of text data from open-ended questions into different categories. Manual coding workflow is expensive, time-consuming and vulnerable to subjectivity. Encouraged by the success of large language models in various linguistic tasks, we explore the use of generative pre-trained transformer (GPT) to enhance efficiency, accuracy, and consistency in coding of open-ended questions. We use data from the California Health Interview Survey (CHIS) that annually collects data on a multitude of questions with open-ended responses. We fine tune the GPT model on training data and compare its performance to the default model. These approaches are illustrated with open-ended questions in CHIS 2023.

ESRA 2023 Glance Program

Coding and Analyzing Open-Ended Questions in Surveys

Papers

What Makes a Good Citizen for You? Evidence from an Open-Ended Survey in Germany

Don’t use these non-proprietary LLMs for inductive coding of qualitative survey data (yet)

Fine-Tuning Large Language Models for Automatic Coding of Open-Ended Questions