ESRA logo

ESRA 2023 Glance Program


All time references are in CEST

Internal and external validity in data donation studies

Session Organisers Mr Ádám Stefkovics (HUN-REN Centre for Social Sciences)
Ms Laura Boeschoten (Utrecht University)
Mr Johannes Breuer (GESIS)
Mr Zoltán Kmetty (HUN-REN Centre for Social Sciences)
Mrs Júlia Koltai (HUN-REN Centre for Social Sciences)
Ms Bella Struminskaya (Utrecht University)
TimeTuesday 18 July, 09:00 - 10:30
Room

Over the last years, the use of digital trace data has grown rapidly across fields and topics in the social sciences. Given the challenges and risks associated with API-based data collection, one of the most promising recent methods for accessing such data is through data donations. Data Download Packages (DDPs) offer an effective approach for collecting detailed and potentially also multimodal individual-level information on the use of social media and other digital services and devices (e.g., messenger apps or fitness trackers). Linking the data from DDPs with survey data is especially valuable for enhancing the depth and accuracy of social-scientific research.
However, according to prior research, one caveat of data donations is that preserving internal and external validity can be challenging in these projects. Systematic bias originating from the data processing and measurement of online behaviour can harm data quality and issues with sampling, coverage and nonresponse may undermine the generalizability of data donation studies. With this session, we want to address these important methodological issues related to data donations. We invite contributions for the session which provide new theoretical or empirical insights and practical solutions to systematic biases that may harm internal and external validity in data donation studies. Contributions may cover the following topics but are not limited to:
· Factors influencing willingness to donate social media data
· Coverage bias and issues with sampling frames in data donation studies
· Sampling bias and random selection in data donation studies
· Nonresponse, compliance, and consent bias in data donation studies
· Data processing and measurement error in data donation studies, preserving data quality

Keywords: data donation, data quality, validity, measurement

Papers

Do Participants Understand Data Donation?

Ms Danielle McCool (Utrecht University) - Presenting Author
Dr Laura Boeschoten (Utrecht University)
Dr Bella Struminskaya (Utrecht University)

The provisions of the 2016 General Data Protection Regulation (GDPR) requiring that data-collecting organizations make users’ own data available to them have opened a unique path for researchers to investigate real world human-technological interactions. These Data Download Packages (DDPs) are rich and complex, representing the full interactions that a user may have with a system, providing a way to assess individuals’ social networks, estimate exposure to polarizing media, or provide insight into long-term travel behavior.
However, the same depth that makes DDPs desirable as research tools potentially makes them very sensitive. A DDP may contain intimate conversations, links to personal videos, or detailed location information. Individual concerns for privacy therefore play a considerable role in a user’s decision to donate their digital trace data. While the Port workflow offers a privacy-sensitive alternative for 1) extracting only portion of the DDP relevant to the research and 2) allowing the user to inspect the extracted data before the donation step, users may not understand or trust the process.
To investigate this aspect of the donation process and its impact on users willingness to donate, multiple data donation studies in recent years have included a quiz designed to test a participant’s understanding of this privacy-sensitive methodology. Users were first exposed to an explanation of the process including their capacity to decide not to donate after viewing the data, and subsequently asked a set of true/false questions about the process. In this presentation, we examine the role of understanding in an individual’s decision to donate his or her data, investigating the interaction with individual concerns about privacy, and users’ technical fluency.


How accurate are survey measures on Facebook activity based on donated digital data?

Dr Ádám Stefkovics (HUN-REN Centre for Social Sciences) - Presenting Author
Professor Zoltán Kmetty (HUN-REN Centre for Social Sciences)

Abstract: Digital, and within that, social media use is increasingly being treated as a significant predictor or outcome variable for various topics and research questions. The validity of these approaches, however, lies in the validity of media use measures. This study extends the current body of literature by contrasting survey reports of Facebook use utilizing digital data donated. Our main research question is: Do self-reports of Facebook use correlate with actual behaviour found in the digital data? Furthermore, in a pre-registered experiment, we varied the ordering of the survey items to assess which item order yields more accurate responses. Data comes from a data donation study, fielded in Hungary, between February and June 2023. Respondents were asked to participate in the project in three ways: answer a short questionnaire with eligibility questions and a consent form (1), download and upload their social media data to the project's website (2), and answer a 30-minute-long survey. We used a standard self-reported measure of Facebook activity and contrasted it with frequencies derived from donated Facebook data. In the experiment, one group received questions in an order such that the frequency of each activity was first asked about their activity in general followed by items on specific topics, whilst the other group received the same questions in a specific-general order. The preliminary results show a relatively strong correlation between survey reports and digital measures, but self-reports tend to overestimate less frequent activities and underestimate more frequent activities. For some measures, the general to specific question order design seems to work better. This study helps to understand to what extent self-reported social media use measures provide accurate behaviour estimates. We further shed light on the role of questions order and make recommendations for survey design.


Lessons Learned in Using Data Donation To Study the Impact of Constant Connectivity on Well-Being

Dr Angelica Maria Maineri (Erasmus University Rotterdam/ODISSEI) - Presenting Author
Dr Laura Boeschoten (Utrecht University)
Dr Niek de Schipper (University of Amsterdam)
Professor Claartje ter Hoeven (Utrecht University)

The spread of mobile digital technologies and communication platforms (e.g., Slack, MS Teams) fosters constant connectivity, i.e., the tendency of always staying “tuned in” to work. This may have a negative impact on employees’ well-being. To address this issue, we asked participants to donate Slack access logs using the software Port, which enables privacy-preserving data donation. Constant connectivity can hence be measured by checking the timing of Slack sessions against an individual’s working hours (collected via a survey alongside other measures of interest). In the presentation, we review the methodological issues we encountered and summarize the lessons we learned.

First, the fieldwork proved challenging, and the study achieved a low completion rate for at least two reasons: first, the design of the study involved several steps (pre-screening, informed consent, survey, data donation), some happening a few days apart and resulting in attrition at each step. Second, many potential participants perceived Slack logs as ‘company data’ which they did not feel comfortable sharing. Despite indicating on multiple occasions that data would not make participants or their employers/companies identifiable, and that data could be inspected before being shared with the researchers, this proved to be a significant challenge which requires rethinking best practices for recruiting as well as rewarding participants.

Second, the lack of documentation from the Slack platform made it difficult to exactly interpret the information included in the access logs and therefore creating reliable measures of the construct of interest (i.e., constant connectivity). Measurement validity will be assessed by comparing the information from the Slack logs to the self-reported information from the survey. More research is needed to understand how to reliably and validly operationalise concepts from digital trace data.

The insights from our study are used to formulate recommendations for


Studying the use of ChatGPT for political information through data donations

Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Mr Julian Kohne (GESIS - Leibniz Institute for the Social Sciences)
Dr Mareike Wieland (GESIS - Leibniz Institute for the Social Sciences)

An increasing number of people have started to use large-language model (LLMs) chatbots for searching all sorts of information, including political information, e.g., on elections, parties, politicians, or political systems. As several studies have shown that self-report data on internet/media use can be unreliable, especially for very specific, rare, or sensitive types of use, there is a need for more objective data sources on the use of LLM-based chatbots for political information, which the method of data donation can offer. In our study, we focus on ChatGPT because it is currently most widely used LLM-based chatbot and offers an easy-to-use option for exporting well-formatted data download packages (DDPs). We employ a combination of DDP collection (via a module for the PORT data donation tool) and an online survey to answer both substantive questions on the use of ChatGPT for political information and methodological questions on donating ChatGPT data. For this presentation, we will focus on the latter, namely: 1) What share of ChatGPT users are willing to export and share their data for research purposes? & 2) Are there systematic differences between users regarding their willingness to donate ChatGPT data with respect to sociodemographics, political interest, privacy concerns, and attitudes towards LLMs? Data for this study comes from participants of a non-probability online access panel who live in Germany and have experience with using ChatGPT. The PORT module has already been adapted to our study requirements. Data collection will begin in March 2025 once the ethics documents are finalized. Based on the results of this study, we will discuss implications for research using chatbots as sources for data donation studies.