ESRA logo

ESRA 2023 Glance Program


All time references are in CEST

New Data Spaces for the Social Sciences - An Interdisciplinary Program for Survey Innovation in Germany

Session Organisers Professor Cordula Artelt (Leibniz Institute for Educational Trajectories)
Dr Anika Schenck-Fontaine (Leibniz Institute for Educational Trajectories)
Professor Corinna Kleinert (Leibniz Institute for Educational Trajectories)
TimeTuesday 18 July, 09:00 - 10:30
Room

To expand our understanding of and have an impact on the major social challenges of the coming decades, including digitization, climate change, growing diversification, pandemics, and war-induced societal changes, the social sciences need to unlock new opportunities for collecting and analyzing data. Many countries have a set of well-established longitudinal survey programs, but surveys are plagued with fundamental challenges related to validity, cost, and sustainability. Therefore, systematic and far-sighted social science research needs to explore the potential of recent technological advances and explore new forms of data, new methods of data acquisition, and new measures of data quality.

Developing and utilizing such new data sources and data infrastructures necessitates the bundling and orchestrating of skills, knowledge, and expertise across different fields of empirical social sciences and computer sciences, which can only be managed by large-scale research programs. To achieve these goals, the German Research Foundation (DFG) has established the long-term infrastructure priority program “New Data Spaces for the Social Sciences” to open up and develop such new data spaces (https://www.new-data-spaces.de/en-us/). Within this program, a series of highly innovative research projects in four main research areas were funded: exploration and integration of different data types, respondent-driven designs, instrument validity, and multimodal data acquisition. The purpose of this session is to introduce this program, present first results of research projects funded within this program, and foster exchange with initiators and researchers who are active in similar programs in other countries.

Keywords: Survey innovation, New Data Spaces, data infrastructure, Germany

Papers

Exploring the Impact of Virtual Avatars in Survey Interviews

Mr Patrick Schrottenbacher (Goethe University Frankfurt) - Presenting Author
Dr Lydia Kleine (Leibniz Institute for Educational Trajectories)
Professor Alexander Mehler (Goethe University Frankfurt)
Professor Corinna Kleinert (Leibniz Institute for Educational Trajectories)
Professor Christian Aßmann (Leibniz Institute for Educational Trajectories)

The field of representative survey studies has evolved significantly, enabling surveys to be conducted through live video interviewing, a promising alternative to face-to-face interviewing. Recent technological advances have facilitated innovations, including the representation of both the interviewer and the interviewee as virtual avatars. A significant factor in enabling this has been the improvements made to Virtual Reality (VR), which have enhanced accessibility to features such as hand, face, and eye tracking. Representing the interviewer via a virtual avatar has both positive and negative effects on user comfort and experience, which, in turn, influence the overall quality of the interview. For example, the so-called Other Avatar Effect arises partly from the discrepancy between the perceived voice (and the associated self-imposed image of the speaker's appearance) and the actual appearance of the avatar.
To date, there has been little research into the impact of avatars representing interviewers in VR on the data quality of interviews. By means of special experiments, the FACES project will investigate the influence of avatar features. These experiments will examine the influence of situational factors, such as the interview environment, to explore the interaction space of features as broadly and efficiently as possible. This will be achieved through smaller specialized experiments that aim to isolate particular factors. Initial hypotheses are presented alongside the virtual environment setup, and the design of the interview situation for the larger study is discussed.


Evaluating ASR for Social Science Research: A Comparison of Semantic Metrics (Authors are Members of SPP 2431)

Mr Nicolas Ruth (Leipzig University)
Mr Andreas Niekler (Leipzig University)
Ms Leonie Steinbrinker (Leipzig University) - Presenting Author
Mr Stephan Poppe (Leipzig University)

Automatic Speech Recognition (ASR) is an essential technology for automating the transcription of qualitative data in social science research, particularly with large interview datasets. Recent advancements in ASR have introduced powerful new tools to the field, but their implementation requires careful and thoughtful consideration to ensure reliability and accuracy. Since outcomes vary significantly depending on the model and its (hyper-)parametrization, it is crucial to evaluate the generalization capabilities of ASR models on specific research data using a meaningful and comparable metric. Addressing these challenges will enable social scientists to effectively leverage these technologies in their research.

The most commonly used metric for this purpose is the Word Error Rate (WER). WER depends on specific language-specific text transformations and focuses on surface-level accuracy, making it inadequate for evaluating transcript quality in social sciences and downstream NLP tasks. To address limitations, modern, semantics-oriented metrics have been developed in recent years. Metrics such as Embedding Error Rate (EmbER) and Semantic-WER apply penalties for different types of errors, while methods like BERTScore, SeMaScore, SemDist, and Aligned Semantic Distance (ASD) improve evaluation by utilizing contextual embeddings and advanced matching techniques to assess semantic similarity.

Our research centers on comparing the usability of these semantic metrics for ASR in social sciences and developing an evaluation for ASR transcriptions using aligned window-based semantic comparison, as opposed to relying on traditional singular value metrics. The proposed talk is not only designed to improve the quality of individual research projects but also to contribute to the creation of new data spaces for the social sciences, where ASR is a fundamental technology. Robust ASR evaluation and utilization methods, particularly those addressing semantic validity crucial for qualitative datasets, are essential for unlocking the potential of large-scale qualitative data and enabling new research directions.


The Future of Survey Data Collection in the UK

Professor Peter Lynn (University of Essex) - Presenting Author

This presentation will provide an overview of the work, achievements, and vision of Survey Futures, a major initiative in the UK designed to future-proof social survey data collection (www.surveyfutures.net). Survey Futures addresses a number of challenges that are also of interest to New Data Spaces for the Social Sciences, including those related to the costs and sustainability of survey data collection, data quality and new modes and methods. Tools and guidance will be provided for survey commissioners, survey agencies and survey data users, and we also hope to influence the debate about the future role of surveys within a changing data landscape.


New Data Spaces Cross-National Synergies at the Example of Respondent-Driven Sampling

Dr Carina Cornesse (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Dr Jean-Yves Gerlitz (University of Bremen)
Professor Olaf Groh-Samberg (University of Bremen)
Mr Curtis Jessop (The National Centre for Social Research (NatCen))
Professor Olga Maslovskaya (University of Southampton)
Professor Sabine Zinn (German Institute for Economic Research)

The German Infrastructure Priority Program "New Data Spaces for the Social Sciences," funded by the German Research Foundation, was established to facilitate collaboration among research infrastructures, leverage synergies, and drive innovation in the field of data collection. This initiative shares a similar vision with the UK’s "Survey Futures," a program funded by the Economic and Social Research Council to advance survey data collection methods. To propel the development of data collection methodologies, it is essential to expand the perspective on infrastructure innovation from a national to an international level, fostering synergies between countries and infrastructure programs.

This presentation contributes to this dialogue by focusing on two projects that explore an innovative data collection methodology—respondent-driven sampling (RDS)—within the German and UK programs. It will outline the two projects, including the distinct study designs planned for 2025 in both countries, discuss how the projects mutually benefit each other, and explore strategies for strengthening and expanding international collaboration between infrastructure programs. Additionally, the presentation will highlight key similarities and differences between the German and UK infrastructure programs, providing insights from the perspective of researchers funded under these initiatives.


The Behavioral Measurement Toolbox

Dr Julian Detemple (University of Mainz)
Professor Florian Hett (University of Mainz)
Professor Michael Kosfeld (University of Frankfurt) - Presenting Author
Dr David Poensgen (University of Frankfurt)

Understanding human preferences – e.g., related to time, risk, or social considerations -- has been central to advancing research in economics and other social sciences. These preferences influence decision-making in diverse contexts, from individual choices to social interactions, and aggregate outcomes. However, existing behavioral methods to measure preferences based on incentivized choices in stylized decision situations are not standardized and often difficult to implement in survey studies and field settings. Further, there exists insufficient research regarding which exact measure proves to be the most valid.

This paper introduces the "Behavioral Measurement Toolbox" (BMT), a user-friendly platform designed to measure individual traits like time, risk, and social preferences through incentivized and controlled decision situations. With just a few clicks, researchers can implement these measures, with BMT maintaining comparability and transparency across different implementations. Details regarding past measurements are easily traceable and documented, which facilitates meta-studies and fulfilling open science requirements. Flexible usability, online and offline, enables standardized measurement across different field contexts, in surveys, as well as lab studies. BMT’s flexible technical architecture also explicitly allows integration with existing survey platforms and hence complements other methods within the quantitative social sciences. BMT is therefore ideally suited to allow researchers from all related disciplines to incorporate behavioral preference measures into their work, while it also promises methodological advancement in terms of standardization and establishing validity.


Opportunities and Challenges for the Future of Surveys

Professor Pamela Davis-Kean (University of Michigan) - Presenting Author

Surveys have long been the primary avenue for gathering data on the beliefs and behaviors of individuals across the world. However, in the past decades, individuals who provide their time to answer questions and provide us with information on their lives have dwindled to the point where it is becoming difficult to ascertain the sample sizes large enough to analyze the data which is especially problematic for sub-groups of interest. Response rates have been declining for a while but took a particularly dramatic hit during the COVID-19 pandemic years and have rebounded only slightly over the last few years. These lowered response rates threaten to reduce the effectiveness of surveys to help researchers make inferences about the population. This has led to a "crisis" in survey research on figuring out ways to do surveys in the future that represent the populations of interest.

For decades, the “gold standard” of creating new randomized population surveys has been to randomize address information and connect with respondents by knocking on doors and asking those who live at these addresses to participate in a given survey. However, this method is not longer garnering the response rates of 20 years ago where you could expect to get around 80% of the sample to participate in a study. The average response rates using this method are now considered very good at 50% but are generally lower. Web surveys have been used for decades to try and find another “entrance” into the household but have consistently low response rates with many national panels only obtaining around 5-6% of respondents who agree to take the survey. This presentation will discuss new ideas and opportunities for increasing response rates in surveys as well as the importance of sustaining data infrastructure for societal challenges in the future.


CIRCLET: Unified Solutions for Data Processing and Visualization in Computational Social Sciences

Mr Kevin Bönisch (Text Technology Lab, Goethe-University) - Presenting Author
Mr Alexander Mehler (Text Technology Lab, Goethe-University)

Recent technological advances, particularly in natural language processing (NLP) and the computational humanities, are leading to novel applications and corresponding evaluations in the social sciences, especially in the context of large corpora. To bridge the gap between NLP and the social sciences, systems are needed that can be continuously extended by the integration of new research results and at the same time offer horizontal and vertical scalability of the underlying (mostly textual) resources. This means that systems are needed that support both the processing and visualization of multimodal data while ensuring interoperability. To address these needs, we present a synthesis of two recent developments: for scalable data processing, we use the Docker Unified UIMA Interface (DUUI) as an NLP pipeline, followed by a novel system for visualizing text-based data, the Unified Corpus Explorer (UCE). In this study, we focus on the latter, demonstrating UCE's versatility in making UIMA-annotated data searchable, visually accessible, and tangible. Our approach leverages a dynamic and customizable microservice architecture, incorporating various annotations such as named entities and semantic roles, as well as Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques. Together, these methods enable the generation of embedding spaces, facilitate corpus-based interactions through chat interfaces, and provide a range of search and visualization functionalities for the underlying data. Finally, we highlight the genericity of both DUUI and UCE, demonstrating their integration across multiple domains and use cases, facilitating both primary and secondary data analysis, all without the need for re-implementation or costly model redevelopment. We exemplify UCE using two datasets relevant to the social and educational sciences: one based on Twitter and the other based on experimental data from Critical Online Reasoning (COR) experiments.