ESRA logo

ESRA 2023 Glance Program


All time references are in CEST

Evaluating applications of generative artificial intelligence in questionnaire design, evaluation and testing

Session Organisers Dr Caroline Roberts (University of Lausanne)
Professor Patrick Sturgis (London School of Economics and Political Science)
Dr Tom Robinson (London School of Economics and Political Science)
Ms Alice McGee (Verian Group (UK))
TimeTuesday 18 July, 09:00 - 10:30
Room

It is widely agreed that Generative Artificial Intelligence (GenAI) will transform conventional practice across the spectrum of service industries in the near future. Since the launch of OpenAI’s ChatGPT in November 2022, GenAI applications have already radically impacted work practices across multiple sectors, and their potential to revolutionise survey research has quickly been acknowledged, and started to be investigated empirically. The commercial sector, especially, has been embracing the opportunities GenAI can offer market research practice, and the past two years have seen a mushrooming of new platforms and tools based on custom-trained generative models, and particularly, Large Language Models (LLMs). These offer a broad range of solutions, from automated sampling and recruitment; synthetic data generation and augmentation; opinion and behaviour prediction and forecasting; data cleaning and validation; qualitative and quantitative data analysis; to questionnaire and dynamic survey design. However, researchers responsible for the implementation of high quality, academic and government surveys have been more sceptical and cautious about the utility and effectiveness of some of these GenAI applications, as well as about their ethical implications. Nevertheless, there is growing recognition of the urgency for research to investigate and evaluate the opportunities and risks this transformative general purpose technology can offer, and to appropriately anticipate its likely disruptive impact on current practice. In this session, we invite presentations of research investigating diverse applications of Gen-AI in the different steps involved in survey questionnaire design, and in their evaluation and testing (QDET). We encourage submissions from researchers from a broad range of sectors who are currently engaged in evaluating and validating alternative Gen-AI models and developing custom tools with the potential to transform current QDET practice in high quality, general population, probability-based sample surveys in high quality surveys.

Keywords: GenAI, LLMs, validation, cognitive interviewing, expert review

Papers

Assessing the Effectiveness of LLMs in the Evaluation of Draft Survey Questions Using Rule-Based QDET Frameworks

Dr Caroline Roberts (University of Lausanne)
Professor Patrick Sturgis (London School of Economics and Political Science) - Presenting Author
Dr Tom Robinson (London School of Economics and Political Science)
Ms Alice McGee (Verian)

It is widely agreed that Generative Artificial Intelligence (GenAI) will transform conventional practice across the spectrum of service industries in the near future. It seems unlikely this will exclude survey research. Understanding how to capitalise on this potential is a key priority. The present study forms part of a wider project, funded in the context of the UK's ESRC-funded 'Survey Futures - Survey Data Collection Collaboration' research programme, designed to assess the utility of Large Language Models (LLMs) for improving the quality and cost-efficiency of questionnaire design, evaluation and testing (QDET). The research addresses the following questions: 1) How effective are LLMs at a) applying question design and evaluation frameworks, and b) generating and analysing cognitive interview data, to identify problems with draft survey questions?; 2) What are the optimal ways of fine-tuning and prompting LLMs to match or exceed human performance of QDET tasks?; and 3) Which procedures should be followed to ensure the use of LLMs in QDET tasks optimises survey quality, while complying with legal and ethical frameworks? We present preliminary findings of the first phase of this research, which focuses on the effectiveness of LLMs at applying existing frameworks and checklists in the evaluation of survey questions, and the results of a validation exercise comparing the results of Gen-AI QDET evaluations with those of human coders.


Integrating ChatGPT into the cognitive pretesting of questionnaires

Dr Timo Lenzner (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Dr Patricia Hadler (GESIS - Leibniz Institute for the Social Sciences)

Since the launch of ChatGPT in November 2022, large language models (LLMs) have become a major topic in academic circles. In all areas where language data play a central role, they have great potential to become part of a researcher’s methodological toolbox. One of these areas is the cognitive pretesting of questionnaires. There are several areas in which LLMs can augment current pretesting procedures and potentially render them more effective (and probably more objective): (1) identifying potential problems of draft survey questions prior to cognitive testing, (2) suggesting cognitive probes to test draft survey questions, (3) simulating or predicting respondents’ answers to these probes (i.e., generating ‘synthetic samples’), and 4) automatically coding (or analyzing) respondents’ answers. In this study, we examine how well ChatGPT performs these tasks and to what extent it can improve our current pretesting procedures.
To do so, we re-examined data collected in 10 cognitive pretests carried out at GESIS since the beginning of 2023. Of the 66 items tested in these pretests, we selected 10 items varying with regard to question type (attitudinal, factual, behavioral or knowledge question) and pretest method used to evaluate the items (cognitive interviewing or web probing). Using these 10 items, we prompted ChatGPT 4o to perform the four tasks mentioned above and analyzed similarities and differences in the outcomes of the LLM and human researchers (i.e., us). Data analysis is still ongoing, but first results indicate that with respect to tasks (1), (2) and (4), the results of ChatGPT often overlap with those of human experts. With respect to task (3), however, the answers generated by ChatGPT only marginally match the answers of actual respondents. The talk will close with a discussion of the practical implications of these findings.


Intelligent probing of open responses in online self-completion surveys using Generative AI

Professor Patrick Sturgis (Department of Methodology, London School of Economics) - Presenting Author
Professor Caroline Roberts (Institute of Social Sciences, University of Lausanne)
Professor Tom Robinson (Department of Methodology, London School of Economics)

A major limitation of traditional online self-completion questionnaires is their static
nature. Compared to an interviewer administered survey, self-completion does not
permit follow up probing of initial responses to open questions in a way that is
tailored to the initial response. The advent of generative AI and, in particular, Large
Language Models (LLM) offers new possibilities for implementing this kind of
‘intelligent probing’ in an online self-completion context. LLMs can be trained to
produce human-like follow-up probes that are tailored to the content of initial open-
text responses in ways that should improve the quantity and quality of the data
obtained. It also offers the potential to reduce the number of questions that need to
be asked to find the correct code, thereby minimising respondent burden. In this
paper we implement AI-based intelligent probing to measure the occupations of
respondents. Occupation is a notoriously difficult variable to measure due to the very
large number of occupations and the technical ways that occupations are described
in standard classifications. We use an LLM to probe initial responses to a standard
occupation question and to code the probed responses to the SOC2000
classification. We compare the LLM-probed data, before and after coding, to
SOC2000 to a static occupation question coded by human coders.