ESRA 2025 Preliminary Program
All time references are in CEST
Bridging Methodology and Computational Social Science |
Session Organisers |
Mr Joshua Claassen (DZHW, Leibniz University Hannover) Dr Oriol Bosch (Oxford University) Professor Jan Karem Höhne (DZHW, Leibniz University Hannover)
|
Time | Thursday 17 July, 09:00 - 10:30 |
Room |
Ruppert 002 |
In today’s world, daily activities, work, and communication are continuously tracked via digital devices, generating highly granular data, including digital traces (e.g., app usage and browsing) and sensor data (e.g., geolocation). Researchers from various disciplines are increasingly utilizing these data sources, though often with different research objectives. Methodologists tend to focus on evaluating the quality and errors of digital data, while Computational Social Scientists (CSS) often leverage these data to answer more substantive research questions. However, there is a lack of collaboration between both worlds, resulting in a discipline divide.
For example, CSS researchers have embraced data donations, yet methodologists have not provided sufficient empirical evidence on the quality of such data. Moreover, web tracking data is rapidly being adopted in CSS, but methodological guidelines on how to gather the substantive content of website visits and apps (e.g., through HTML scraping) is lacking. However, there are methodological error frameworks covering both measurement and representation. These frameworks are yet to be (fully) leveraged.
This session invites contributions that bridge the gap between methodology and CSS, fostering collaboration across disciplines. We particularly welcome CSS work that incorporates a strong methodological foundation, as well as methodological research with clear relevance to substantive CSS inquiries. Topics may include, but are not limited to:
• Substantive research showcasing best practices when using digital data
• Assessments of digital data in terms of quality and errors
• Approaches reducing representation, sampling, and measurement errors of digital data
• Studies substituting more traditional data collections (e.g., web surveys) with digital data (e.g., measuring opinions with digital traces)
• Studies that go beyond the pure tracking (or donating) of app, search term, and URL data, including data integration and enrichment strategies
Keywords: Digital trace data, Computational social science, Survey methodology, Web tracking, Data donation
Papers
Qualitative Insights from AI Summaries of Social Media Posts
Professor Michael F. Schober (The New School) - Presenting Author
Professor Johann A. Gagnon-Bartsch (University of Michigan)
Professor Frederick G. Conrad (University of Michigan)
Ms Rebecca S. Dolgin (The New School)
Mr Mao Li (University of Michigan)
Mr Erik Zhou (University of Michigan)
Ms Peilin Chen (University of Michigan)
Dr Paul Beatty (US Census Bureau)
Social media posts have the potential to capture public opinion in new ways, including reaching members of the public who may not normally participate in focus groups or quantitative studies, but the large volume and complexity of the content creates significant challenges for researchers. Can AI tools be used to efficiently glean qualitative insights from large corpora of social media posts? The study reported here compares qualitative insights about barriers to participation in the US Decennial Census generated from (a) AI summaries of samples of social media posts from a corpus of 17,497 tweets about the US Census collected before and during the administration of the 2020 Decennial Census; (b) crowdsource workers’ judgments based on their reading from one to 16 samples of 25 social media posts from the same corpus, and (c) 42 focus groups carried out across the US in advance of the 2020 Decennial Census. We report on different methods for generating AI summaries through different prompts to the Llama3.1 large language model and different methods of sampling subsets of tweets of different sizes from the larger corpus. We also report on the extent to which the prompting and sampling methods we tested–which include prompts designed to elicit insights following the same instructions that were given to MTurk workers–lead to insights comparable to and divergent from the insights generated by focus groups and crowdsourced human judgments from the same period. Findings suggest that AI summaries show substantial promise in helping researchers deal with the content of large corpora of social media posts, as well as challenges for researchers embarking on such methods to consider.
Using a Qualitative Approach to Better Understand Why People Are (Un)willing to Participate in Smartphone App Data Collection
Dr Alexander Wenz (University of Mannheim) - Presenting Author
Mr Wai Tak Tung (University of Mannheim)
While smartphones have become promising tools for collecting digital behavioral, sensor, and survey data in the social sciences, the recruitment of study participants who are willing to install a smartphone app and fully participate throughout the study period remains a challenge. Previous research has experimented with various approaches to increase study participation and adherence, but with moderate success at most. In this paper, we report the results from qualitative in-depth interviews to better understand the mechanisms underlying the decision to participate in smartphone app data collection. The interviews are supported by a semi-structured discussion guide and conducted among individuals with different sociodemographic characteristics (age, gender, educational attainment) and varying levels of smartphone skills.
The study aims to address the following questions:
• Which potential difficulties and risks do individuals perceive in smartphone-based data collection? How do individuals perceive the collection of different forms of data, in particular survey, GPS, Internet browsing, and app usage data?
• Under which conditions might individuals be more willing to participate and adhere in smartphone-based research?
• Which strategies to increase participation and adherence might work best for whom? Which strategies might work best for underrepresented groups?
Beyond Binary Bytes: Mapping the Evolution of Gender Inclusive Language on Twitter
Professor Simon Kühne (Bielefeld University)
Mr Dorian Tsolak (Bielefeld University) - Presenting Author
Mr Stefan Knauff (Bielefeld University)
Mr Long Nguyen (DeZIM Institute)
Mr Dominik Hansen (Bielefeld University)
Languages worldwide differ significantly in how they incorporate gender into grammar and phonetics. In the German language, the generic masculine form (e.g., saying “Lehrer” [teacher, male, sing.]) is used to refer to a group of people with unknown (or non-male) sex and has been criticized for rendering women and non-binary people invisible in language, thereby reinforcing gender biases and unequal power dynamics. Gender-inclusive language (GIL) has been proposed as an alternative to the generic masculine and involves various subtypes. Our study investigates the development of GIL on Twitter between 2018 and 2023. In addition, we study individual (gender) and contextual (regional) effects on the use of GIL.
We rely on a unique dataset of over 1 Billion German language Tweets. We present a pipeline to detect three types of GIL, namely binary feminization, non-gendered GIL and non-binary inclusive language. We do this through a combination of using a fine-tuned German BERT model, regular expressions, and a corpus of German gender-inclusive language words. User names are analyzed based on lists of male, female and unisex names. By inferring the place of residence for the users of more than 300 million Tweets, we shed light on the correlations of socio-structural variables and use of gender-inclusive language across Germany.
We find that GIL adoption increases slightly over the studied 5 year period and we identify different trends among GIL types in this adoption. Furthermore, profiles with female usernames use GIL more often than those with masculine or unisex usernames. In addition, we find regional patterns with more use of GIL in urban regions and regions with a higher share of users with young population.
Exploring Differences in ChatGPT Adoption and Usage in Spain: Contrasting Survey and Metered Data Findings
Dr Melanie Revilla (RECSM-UPF) - Presenting Author
Miss Lucia FERNANDEZ MELERO (RECSM-UPF)
Artificial intelligence (AI) technologies have rapidly integrated into everyday life, yet understanding how users interact with these tools remains limited. This study focuses on one of the AI technologies that has gained significant importance in recent years: ChatGPT, an AI model developed by OpenAI and launched in November 2022. While ChatGPT offers a lot of new opportunities, it also raises concerns about potentially exacerbating the inequalities in access to and use of digital technologies, known as “digital divide”. Studying user demographics, adoption patterns, and usage factors is crucial for addressing these disparities and promoting equitable integration of AI tools. Previous research on ChatGPT usage has identified variations based on gender, age, education, and digital skills. However, most studies rely on survey data, which are prone to measurement errors, particularly when respondents are asked to recall past behaviors. To overcome these limitations, digital trace data, specifically metered data (e.g., URLs visited), provide an alternative by capturing continuous and granular user interactions. However, while such data could mitigate common survey biases and provide deeper insights into technology usage, they also suffer from errors.
Thus, the main goal of this paper is to evaluate how the data collection method used (i.e., survey or metered data) impacts findings related to ChatGPT adoption, usage patterns, and their implications for the digital divide. To achieve this, we use data from the Netquest opt-in panel in Spain, comparing results from two independent samples: one responding to a conventional survey on ChatGPT adoption and usage, and the other providing metered data, which is then used to measure similar variables as those in the survey.
This research advances understanding of how different data collection methods influence findings, while also offering new insights about ChatGPT’s integration into Spanish society and the digital inequities surrounding AI usage.