ESRA 2025 Preliminary Program
All time references are in CEST
Data Quality in Data Donation Studies |
Session Organisers |
Mr Ádám Stefkovics (HUN-REN Centre for Social Sciences) Mr Yannik Peters (GESIS) Mr Johannes Breuer (GESIS) Mr Fiona Draxler (University of Mannheim) Mrs Laura Boeschoten (Utrecht University) Ms Laura Young (University of Mannheim) Ms Bella Struminskaya (Utrecht University) Ms Jessica Daikeler (GESIS)
|
Time | Thursday 17 July, 15:30 - 17:00 |
Room |
Ruppert 040 |
Over the last years, the use of digital trace data has grown rapidly across fields and topics in the social sciences. Given the challenges and risks associated with API-based data collection, one of the most promising recent methods for accessing such data is through data donations. Data Download Packages (DDPs) offer an effective approach for collecting detailed and potentially also multimodal individual-level information on the use of social media and other digital services and devices (e.g., messenger apps or fitness trackers). Linking the data from DDPs with survey data is especially valuable for enhancing the depth and accuracy of social-scientific research.
However, according to prior research, one caveat of data donations is that preserving internal and external validity can be challenging in these projects. Systematic bias originating from the data processing and measurement of online behaviour can harm data quality and issues with sampling, coverage and nonresponse may undermine the generalizability of data donation studies. With this session, we want to address these important methodological issues related to data donations. We invite contributions for the session which provide new theoretical or empirical insights and practical solutions to systematic biases that may harm internal and external validity in data donation studies. Contributions may cover the following topics but are not limited to:
· Factors influencing willingness to donate social media data
· Coverage bias and issues with sampling frames in data donation studies
· Sampling bias and random selection in data donation studies
· Nonresponse, compliance, and consent bias in data donation studies
· Data processing and measurement error in data donation studies, preserving data quality
Keywords: data donation, data quality, validity, measurement
Papers
WhatsApp Data Donations for Interpersonal Relationship Research: First Insights on Data Quality and Relationship-level Chatting Differences
Mr Julian Kohne (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Professor Christian Montag (Ulm University)
We combined a questionnaire with data donations to explore the feasibility of WhatsApp data donations to investigate interpersonal relationships. To do so, we investigated the factors that contribute to participants not donating or restricting data donations (dropout, non-consent to data donation, non-compliance with stated donation intention, and self-censoring the data), as well as the predictive potential of anonymized chat log characteristics for relationship-specific survey responses (relationship type and interpersonal closeness). We examined differences in willingness to donate, actual donations, and self-censorship across a range of demographic, psychological and relationship-relevant characteristics. In our opt-in study (N = 357), after non-consent and dropout from the survey (N = 60), about 70% of remaining participants (N = 206) stated to be willing to donate, with younger participants being more willing, and willing participants exhibiting less privacy concerns than unwilling participants. We found some evidence pointing to women being more willing to donate than men. We did observe an intention-behavior gap with only ~68% (N = 140) of willing participants actually donating. We did not find significant differences between donors and non-donors based on individual tests (e.g., gender, age, personality, relationship status) but found some evidence for personality and sexual orientation as potentially influential factors in a logistic regression model. With respect to relationship characteristics, first descriptive analyses point to observable differences in the number of messages, number of words, number of emoji, number of URLs, and average replytimes by relationship type and interpersonal closeness scores in the last 30 days prior to answering our survey. The final presentation will include LASSO or Ridge Regression model results to quantify the predictive potential of these variables. We discuss implications for conducting WhatsApp data donation studies, limitations, and directions for future research with respect to our findings.
How accurate are survey measures on Facebook activity based on donated digital data?
Dr Ádám Stefkovics (HUN-REN Centre for Social Sciences) - Presenting Author
Professor Zoltán Kmetty (HUN-REN Centre for Social Sciences)
Abstract: Digital, and within that, social media use is increasingly being treated as a significant predictor or outcome variable for various topics and research questions. The validity of these approaches, however, lies in the validity of media use measures. This study extends the current body of literature by contrasting survey reports of Facebook use utilizing digital data donated. Our main research question is: Do self-reports of Facebook use correlate with actual behaviour found in the digital data? Furthermore, in a pre-registered experiment, we varied the ordering of the survey items to assess which item order yields more accurate responses. Data comes from a data donation study, fielded in Hungary, between February and June 2023. Respondents were asked to participate in the project in three ways: answer a short questionnaire with eligibility questions and a consent form (1), download and upload their social media data to the project's website (2), and answer a 30-minute-long survey. We used a standard self-reported measure of Facebook activity and contrasted it with frequencies derived from donated Facebook data. In the experiment, one group received questions in an order such that the frequency of each activity was first asked about their activity in general followed by items on specific topics, whilst the other group received the same questions in a specific-general order. The preliminary results show a relatively strong correlation between survey reports and digital measures, but self-reports tend to overestimate less frequent activities and underestimate more frequent activities. For some measures, the general to specific question order design seems to work better. This study helps to understand to what extent self-reported social media use measures provide accurate behaviour estimates. We further shed light on the role of questions order and make recommendations for survey design.
Lessons Learned in Using Data Donation To Study the Impact of Constant Connectivity on Well-Being
Dr Angelica Maria Maineri (Erasmus University Rotterdam/ODISSEI) - Presenting Author
Dr Laura Boeschoten (Utrecht University)
Dr Niek de Schipper (University of Amsterdam)
Professor Claartje ter Hoeven (Utrecht University)
The spread of mobile digital technologies and communication platforms (e.g., Slack, MS Teams) fosters constant connectivity, i.e., the tendency of always staying “tuned in” to work. This may have a negative impact on employees’ well-being. To address this issue, we asked participants to donate Slack access logs using the software Port, which enables privacy-preserving data donation. Constant connectivity can hence be measured by checking the timing of Slack sessions against an individual’s working hours (collected via a survey alongside other measures of interest). In the presentation, we review the methodological issues we encountered and summarize the lessons we learned.
First, the fieldwork proved challenging, and the study achieved a low completion rate for at least two reasons: first, the design of the study involved several steps (pre-screening, informed consent, survey, data donation), some happening a few days apart and resulting in attrition at each step. Second, many potential participants perceived Slack logs as ‘company data’ which they did not feel comfortable sharing. Despite indicating on multiple occasions that data would not make participants or their employers/companies identifiable, and that data could be inspected before being shared with the researchers, this proved to be a significant challenge which requires rethinking best practices for recruiting as well as rewarding participants.
Second, the lack of documentation from the Slack platform made it difficult to exactly interpret the information included in the access logs and therefore creating reliable measures of the construct of interest (i.e., constant connectivity). Measurement validity will be assessed by comparing the information from the Slack logs to the self-reported information from the survey. More research is needed to understand how to reliably and validly operationalise concepts from digital trace data.
The insights from our study are used to formulate recommendations for
Studying the use of ChatGPT for political information through data donations
Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Mr Julian Kohne (GESIS - Leibniz Institute for the Social Sciences)
Dr Mareike Wieland (GESIS - Leibniz Institute for the Social Sciences)
An increasing number of people have started to use large-language model (LLMs) chatbots for searching all sorts of information, including political information, e.g., on elections, parties, politicians, or political systems. As several studies have shown that self-report data on internet/media use can be unreliable, especially for very specific, rare, or sensitive types of use, there is a need for more objective data sources on the use of LLM-based chatbots for political information, which the method of data donation can offer. In our study, we focus on ChatGPT because it is currently most widely used LLM-based chatbot and offers an easy-to-use option for exporting well-formatted data download packages (DDPs). We employ a combination of DDP collection (via a module for the PORT data donation tool) and an online survey to answer both substantive questions on the use of ChatGPT for political information and methodological questions on donating ChatGPT data. For this presentation, we will focus on the latter, namely: 1) What share of ChatGPT users are willing to export and share their data for research purposes? & 2) Are there systematic differences between users regarding their willingness to donate ChatGPT data with respect to sociodemographics, political interest, privacy concerns, and attitudes towards LLMs? Data for this study comes from participants of a non-probability online access panel who live in Germany and have experience with using ChatGPT. The PORT module has already been adapted to our study requirements. Data collection will begin in March 2025 once the ethics documents are finalized. Based on the results of this study, we will discuss implications for research using chatbots as sources for data donation studies.