ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Ethical, legal, and technological challenges in the collection of data from digital platforms

Session Organisers Dr Heleen Janssen (University of Amsterdam)
Dr Bella Struminskaya (Utrecht University)
Dr Laura Boeschoten (Utrecht University)
Mrs Zoltán Kmetty (HUN-REN Centre for Social Sciences)
TimeThursday 17 July, 13:45 - 15:00
Room Ruppert D - 0.24

In the evolving landscape of survey research, new data sources, such as digital trace data (e.g., online interaction, social media and browsing behavior data), have garnered significant attention. While the value of combining survey data and digital behavioral data in the form of data donation offers great potential for social science research, their practical implementation from a survey research perspective presents unique challenges. Those challenges particularly concern data quality and reliability.
This session will delve into the critical aspects of quality assurance in the context of collecting digital trace data through data donations. We will explore comprehensive frameworks, state-of-the-art tools, and best practices tailored to ensure the integrity and usability of these data sources.
Key topics will include:
1. Frameworks for Quality Assurance: An overview of frameworks designed to evaluate the quality of digital trace data through data donations, including criteria for assessing reliability, validity, and representativeness.
2. Tools and Platforms for Data Validation: A discussion on tools, technologies, and platforms (e.g., the KODAQS toolbox) for validating the quality of digital trace data collected through data donations.
3. Best Practices and Case Studies: Case studies on data donation for collecting and processing online interaction data, focusing on assessing data quality and providing examples of how to measure and improve it. Real-world case studies will illustrate successful integration of these data types into survey research, highlighting challenges and solutions.
4. Didactics of Data Quality Issues: Strategies for teaching data quality assurance for digital trace data through data donations. This segment will focus on educational approaches.
This session aims to foster a deeper understanding of the methodological challenges and practical solutions in assuring the quality of data donations and digital trace data in survey research.

Keywords: Digital Trace Data - Data Donation - Data Quality - Data Linkage

Papers

Legal and Ethical Dilemmas of Social Media Data in Scientific Research

Miss Meike Scholz (TU Bergakademie Freiberg) - Presenting Author

The widespread adoption of social media for data collection has created unprecedented opportunities for researchers. Prior studies use social media data to investigate a variety of different occupational, individual but also societal matters like organizational attractiveness (e.g., Carpentier et al., 2019), human behaviour (e.g., Kaya & Bicen, 2016), or public opinion (e.g., Gorodnichenko et al., 2021). However, collecting and analysing social media data for research purposes raise significant legal challenges. A key concern is compliance with data protection regulations such as the General Data Protection Regulation (GDPR) in Europe. Researchers must navigate complex questions regarding the distinction between public vs. private data, as social media platforms often blur these boundaries. Additionally, terms of service agreements set by platforms like Instagram need to be considered and, thus, may restrict data collection practices. Also, ethical implications of using social media data further complicate legal considerations. Issues of (informed) consent, respect for people or beneficence need to align with regulatory frameworks (Legewie & Nassau, 2018). Furthermore, social media data can be fugacious as they do not underlie the control of the researcher. Just as sudden as an Instagram-post appeared, it can be deleted. This stands in contrast with obligations to store research data for a certain amount of time to align with the principles of FAIR (findable, accessible, interoperable, reusable) and to prevent scientific misconduct (del Pico et al., 2024). Overall, the accordance of legal and ethical considerations can impede academic research by imposing significant barriers to data access, collection and analysis, even when research serves the public interest. Collaborative efforts between legal experts, ethicists, and researchers are essential to create guidelines that balance legal and ethical accountability. Future research must focus on developing procedures that comply with legal standards while maximizing the utility of social media data for scientific inquiry.


Do Participants Understand Data Donation?

Ms Danielle McCool (Utrecht University) - Presenting Author
Dr Laura Boeschoten (Utrecht University)
Dr Bella Struminskaya (Utrecht University)

The provisions of the 2016 General Data Protection Regulation (GDPR) requiring that data-collecting organizations make users’ own data available to them have opened a unique path for researchers to investigate real world human-technological interactions. These Data Download Packages (DDPs) are rich and complex, representing the full interactions that a user may have with a system, providing a way to assess individuals’ social networks, estimate exposure to polarizing media, or provide insight into long-term travel behavior.
However, the same depth that makes DDPs desirable as research tools potentially makes them very sensitive. A DDP may contain intimate conversations, links to personal videos, or detailed location information. Individual concerns for privacy therefore play a considerable role in a user’s decision to donate their digital trace data. While the Port workflow offers a privacy-sensitive alternative for 1) extracting only portion of the DDP relevant to the research and 2) allowing the user to inspect the extracted data before the donation step, users may not understand or trust the process.
To investigate this aspect of the donation process and its impact on users willingness to donate, multiple data donation studies in recent years have included a quiz designed to test a participant’s understanding of this privacy-sensitive methodology. Users were first exposed to an explanation of the process including their capacity to decide not to donate after viewing the data, and subsequently asked a set of true/false questions about the process. In this presentation, we examine the role of understanding in an individual’s decision to donate his or her data, investigating the interaction with individual concerns about privacy, and users’ technical fluency.


Where You Are Is What You Get? Inconsistencies of Digital Trace Data Across Download Locations

Ms Johanna Hölzl (University of Mannheim) - Presenting Author
Professor Florian Keusch (University of Mannheim)
Mr John Collins (University of Mannheim)

Researchers increasingly use digital trace data from online platforms as an alternative or complement to survey data. To collect the data, researchers frequently rely on Application Programming Interfaces (APIs) provided by private companies. The APIs often return samples of the data based on undisclosed or intransparent sampling procedures and algorithms. Previous research identified issues with some APIs’ reliability over time and across API versions. For instance, downloading Google Trends data for identical parameters (i.e., search term, region, time range) but at different time points can give researchers different values on Google Trends’ search index. Users and tweets in the samples varied depending on the former Twitter API’s version used to draw the samples. In this paper, we extend the research on the reliability of digital trace data from APIs by examining the effect of the download location on inconsistencies across samples: Do we get different values from digital trace data APIs depending on where we download the data?
We retrieve samples from Google Trends, YouTube, and the News API from four different countries on three continents (Austria, Germany, the U.S., and Australia) for the same query parameters (i.e., search term, region, and time range). We then compare the samples retrieved from each respective download country, keeping all parameters of the query constant.
Our results point to inconsistencies across download locations, and thus another limitation regarding the reliability and replicability of findings from digital trace data. They serve as a cautionary tale for social science research relying on APIs that provide samples of digital trace data as the download location might impact the findings. Our results also help researchers working with digital trace data from APIs in making their research better replicable by drawing several samples if possible and reporting transparently where they retrieved the data from.