All time references are in CEST
Advancing Survey Research through AI and Machine Learning: Current Applications and Future Directions |
|
Session Organiser | Dr Maud Reveilhac (Departement of Communication and Media Research, Zurich University, Switzerland) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
In recent years, Artificial Intelligence (AI) and Machine Learning (ML) have increasingly been applied in the domain of survey research, offering innovative solutions to long-standing methodological challenges (Shah et al., 2020). AI-powered survey optimization includes adaptive surveys and dynamic questionnaire designs. For instance, AI and ML models are being employed to predict and prevent survey nonresponse and improve respondent engagement by tailoring questions based on real-time data (Buskirk et al., 2020). These tools also help reduce respondent biases by using natural language processing (NLP) to interpret open-ended responses and analyze sentiments more accurately and efficiently (Pandey et al., 2023). Furthermore, data processing and quality control can rely on AI-based methods. For example, ML techniques are being used to detect patterns of low-quality responses, such as speeding and straightlining behaviors (Fernández-Fontelo et al., 2020). Recent advancements also include the use of AI in automating coding and classification tasks for open-ended questions (Zhang et al., 2023), as well as the use of ML for data imputation, enhancing traditional methods by offering more accurate predictions for missing data (Popovich, 2024). These applications highlight the potential of AI and ML to improve survey data quality and reduce labor-intensive tasks. Looking forward, there is significant potential in the integration of AI, which includes: the development of real-time adaptive surveys that adjust to respondent input, the integration of multimodal data collection (integrating voice, image, and text inputs), the assessment of respondent emotions during survey completion, the prediction of future survey behavior (e.g., forecasting nonresponse or disengagement), the integration of AI-powered chatbot for conversational surveys, and the refinement of ethical frameworks surrounding AI's role in survey contexts. This session will explore these topics to define the future of AI in survey research.
Keywords: artificial intelligence, machine learning, survey research, natural language processing, adaptive surveys, multimodal surveys, response quality
Dr Morgan Earp (US National Center for Health Statistics) - Presenting Author
Dr Lauren Rossen (US National Center for Health Statistics)
Ms Sarah Forrest (US National Center for Health Statistics)
Dr Trent Buskirk (Old Dominion University)
Measurement equity is critical for the accurate assessment of sex disparities in health outcomes. Estimated disparities in the prevalence of a given health outcome can be affected by whether the outcomes are measured in a health exam or self-reported in a survey based on a diagnosis from a health professional. Using data from the National Health and Nutrition Examination Survey (NHANES) which contains both measured data in a health exam and self-reported survey health data, we used machine learning models to assess potential measurement error (e.g. differences between the true health exam measured outcome versus the self-reported health outcome) inequities across several chronic health conditions and behaviors. Using data on four health outcomes (i.e., diabetes, hypertension, high cholesterol, and current smoking) from NHANES (2015 through March 2020), we assessed differences between self-reported survey and health exam measured outcomes by various sociodemographic characteristics (e.g., age, sex, race/ethnicity, education, marital status, health insurance, and poverty). We used linear regression trees via the ‘rpms’ package in R to identify demographic subgroups with larger differences between self-reported and measured health outcomes. Identifying subgroups where measurement error may be larger or smaller could help inform future work on improving the estimation of the prevalence of chronic conditions and related health disparities.
Dr Silvia Schwanhäuser (Institute for Employment Research (IAB)) - Presenting Author
Professor Joe Sakshaug (University of Mannheim, University of Munich (LMU), and Institute for Employment Research (IAB))
Professor Natalja Menold (University of Dresden (TU-Dresden))
Professor Peter Winker (University of Giessen)
Interviewer-administered surveys are inherently susceptible to the influence of deviant or fraudulent behavior on the part of interviewers. Even small amounts of data, fabricated by interviewers, can severely bias estimation results. Consequently, identifying falsified interviews is an important part of the quality control process. In addition to established quality control methods, like re-interviews or monitoring, statistical i.e., data-based detection methods can help identify potential falsifications by flagging suspicious patterns in the data. One understudied statistical detection approach in this context is the use of supervised machine learning algorithms that is algorithms trained on existing falsification data.
This study explores the application of these algorithms for detecting falsifications, employing both experimental data and real survey data: The experimental data were collected specifically to study falsifications and the behavior of falsifiers. The survey data come from a large nationally representative survey of refugees in Germany with known fabricated interviews. We investigate how effective different supervised algorithms, such as regression models, decision trees, support vector machines, and neural networks, are at identifying patterns caused by falsifiers. Simulating different scenarios, we evaluate the effectiveness of these algorithms 1) when training them on falsifications within the same survey, 2) when training them on falsifications induced by different falsifiers within the same survey, and 3) when training them on falsifications from a completely different survey.
Our results show that supervised algorithms very precisely detected falsifications within the same survey, especially algorithms based on decision trees. However, performance of all algorithms strongly decreases in the between-survey scenario. No algorithm was able to precisely identify falsifications in another survey.
Mrs Christine Tresignie (Ipsos) - Presenting Author
Qualitative methods are indispensable for understanding complex social phenomena across diverse cultural contexts. However, such approaches are highly time-consuming, particularly in multi-country research contexts like research in the Eurozone, where researchers must navigate linguistic diversity, cultural specificity, and vast volumes of unstructured data. These challenges often limit the scalability and timeliness of qualitative research, making it difficult to deliver insights with efficiency while retaining depth and nuance.
This presentation explores the transformative role of Artificial Intelligence in addressing these challenges and streamlining qualitative research workflows for multi-country surveys. We highlight AI-driven innovations that significantly reduce the time required for data processing while ensuring cultural sensitivity and analytical depth.
Through a case study, we demonstrate how AI can enhance the efficiency and comprehensiveness of qualitative research without compromising the researcher’s interpretive expertise. Ethical considerations, including managing bias in AI models and protecting respondent confidentiality, are critically examined, along with strategies to ensure responsible AI use.
Rather than replacing qualitative researchers, AI serves as a powerful enabler, augmenting their ability to deliver nuanced, cross-cultural insights at scale. The objective of the session is to give actionable strategies for integrating AI tools into qualitative research workflows, helping overcome the time and resource constraints of multi-country qualitative studies and fostering innovative approaches to understanding complex social issues at an international scale.
Dr Richard Timpone (Protopian Works, PBC)
Dr Yongwei Yang (Google DeepMind) - Presenting Author
Generative AI is transforming survey research; being leveraged to construct surveys, synthesize data, conduct analysis, and write summaries of the results. While the promise is to create efficiencies and increase quality, the reality is not always as clear cut. Leveraging our framework of Truth, Beauty, and Justice which we use to evaluate AI, machine learning and computational models for effective and ethical use (Taber and Timpone 1996, Timpone and Yang 2024), we consider the potential of these AI tools to augment and replace data science and statistical researchers.
Considering the range of analyses that are applied to survey data on its own and integrated with other sources, we consider the potential of augmenting analysts and democratizing analytics more broadly. We will consider statistical and machine learning analyses focusing on supervised and unsupervised methods including predictive analytics, clustering and segmentation, classification tasks and specific methods like conjoint analysis and synthetic data.
Large Language Models are being applied to many of these tasks, but foundation models are not the best tool for many of them directly and AI agents provide more potential. While AI agents for survey analytics can be leveraged to assist analysts in their tasks, we raise some warnings about push-button automation. Just as earlier eras of survey analysis created some issues when the increased ease of using statistical software allowed researchers to conduct analyses they did not fully understand, the new AI tools may create similar but larger risks. We conclude by encouraging the advance of these tools to complement researchers but advocate for continued training and understanding of methods to ensure the substantive value of surveys are fully achieved by applying, interpreting, and acting upon survey results most effectively and ethically.