All time references are in CEST
Data donation and linking digital trace data 1 |
|
Session Organisers |
Ms Laura Boeschoten (Universiteit Utrecht) Mr Johannes Breuer (GESIS) Mr Zoltán Kmetty (Centre for Social Sciences) Mrs Júlia Koltai (Centre for Social Sciences) Mr Adam Stefkovics (Harvard University) Ms Bella Struminskaya (Universiteit Utrecht) |
Time | Tuesday 18 July, 14:00 - 15:30 |
Room | U6-21 |
Digital traces on digital platforms such as Facebook, Instagram, Google, Whatsapp, etc., and other online traces left by citizens are promising sources of information for scientific research in various fields. Although there are multiple ways to access digital data traces, in recent years, a new approach built on the partnership with citizens has emerged. Donated data can be obtained through installing web and app trackers on participants’ devices, or through data download packages from digital platforms. As opposed to self-reports from surveys which may suffer from measurement error due to recall or social desirability bias, digital traces can provide reliable, behavioral data free from those error sources. When combined with self-report, validity and reliability of measures derived from digital traces can be investigated. Linking several digital trace data sources can provide more insights into the phenomenon but also brings challenges.
While research is growing in this field, we still know little about how to best optimize digital donation approaches, the patterns and determinants of participation and ways to preserve participants’ privacy and linking digital trace data with survey responses.
We invite contributions for the session which provide new theoretical or empirical insights into any phase or aspect of donation of digital trace data. Contributions may cover the following topics but not limited to:
· Data donation methods and methods of data extraction
· Willingness to donate digital trace data, best practices for recruitment
· Sampling, and nonparticipation errors, missing data
· Validity of digital trace data
· Privacy issues, ethical issues, anonymization
· Issues of linking digital data with survey data
· Challenges, analyzing combined data
· Substantive contributions which combine digital trace and survey data
Keywords: linkage donation digital trace social media
Dr Laura Boeschoten (Utrecht University) - Presenting Author
Dr Theo Araujo (University of Amsterdam)
Dr Niek de Schipper (University of Amsterdam)
Dr Bella Struminskaya (University of Amsterdam)
Dr Heleen Janssen (University of Amsterdam)
Dr Kasper Welbers (Vrije Universiteit Amsterdam)
Digital traces left by citizens during the natural course of modern life hold an enormous potential for social-scientific discoveries, because they can measure aspects of our social life that are difficult or impossible to measure by more traditional means.
As of May 2018, the EU General Data Protection Regulation obliges any entity, public or private, that processes personal data of citizens of the European Union to provide that data to the data subject (the person to whom the data pertains) upon their request, in digital format. Most major private data processing entities, comprising social media platforms as well as internet service providers, search engines, photo storage providers, e-mail providers, banks, energy providers, and online shops comply with this right to data access, by providing the data subjects with so-called ‘Data Download Packages’ (DDPs) .
We have introduced a workflow and corresponding software to allow the collection and analyses of digital traces on the DDPs, while preserving the right to privacy and data protection of research participants.
However, as a researcher interested in preparing a data donation study, expertise on various domains is required, such as on IT and programming to configure the study, but also on how to preserve privacy, ethics and the use of an appropriate methodology.
To guide and assist researchers through this challenging process, we are developing an online platform allowing researchers to configure, host and monitor their data donation studies. During this presentation, I discuss the key functionalities of this platform such as data extraction, data storage and progress monitoring, and how they align with the GDPR and ethical requirements.
Mr Oriol J. Bosch (The London School of Economics) - Presenting Author
Mr Marc Asensio (University of Lausanne)
Dr Caroline Roberts (University of Lausanne)
When studying the relationship between smartphone usage and other aspects of people’s lives, accurate data is required. Although self-reports are the main instrument to measure smartphone usage, there is evidence to doubt about their validity. Recently, approaches to directly observe what participants do online, such as web trackers, have gained in popularity. Nonetheless, recent evidence shows that these approaches are also affected by errors and their implementation is inaccessible to most researchers.
Consequently, rincreasing interest is beeing devoted to data donations, which involve asking participants to share data that their devices and services already collect from them such as the time they spend using their phone. This approach has the advantage of not relying on either participant’s memory or tracking apps. However, compliance rates are still low, potentially introducing nonresponse bias. It is imperative for data donations, hence, to produce high-enough measurement quality gains to be considered a valid alternative to self-reports. In this study we focus on the gains when collecting already saved information about participants’ daily screentime, number of pickups and specific app usage, as reported in the Digital Wellbeing / Screentime tools of their smartphones.
To study this, we conducted a within and between-subject survey experiment in an online panel (N = 872). At the beginning of the survey, participants self-reported their usage. By the end, participants were randomly asked to share this information in three separate ways: uploading several screenshots of the tools; uploading video recordings; and manually checking and reporting the information from the tool.
We present, for each data donation approach, the absolute difference between the measurements created with self-reports and data donations. We also show the comparative convergent and predictive validity of self-reports and data donations. Additionally, we discuss potential errors affecting the data donation estimates.
Mr Benedikt Rohr (Computational Communication Science JGU Mainz) - Presenting Author
Mrs Alicia Ernst (Computational Communication Science JGU Mainz)
Mr Felix Valentin Dietrich (Media Effects & Media Psychology JGU Mainz)
Professor Michael Scharkow (Computational Communication Science JGU Mainz)
To overcome the unique limitations of self-reported or passive measurements of behavior, social scientists increasingly adopt research designs linking digital trace and survey data (Stier et al., 2020). In most previous applications, the linkage procedure connecting both data sources occurs ex post, i.e. after a survey wave is completed and/or tracking data have been donated. For many questions in communication research, however, a real-time data linkage design seems highly attractive.
Following previous studies using event-based experience sampling (Masur, 2019), we discuss a linkage design for studying music streaming use that combines real-time API access and online surveys, where linkage happens both ex post and ex ante. We use Spotify’s Implicit Grant Flow to collect listening session information (via explicit but unobtrusive data donations) which are immediately used to anchor survey questions about listening experiences, e.g., “On February 13th, from 18:12 to 20:05, you listened to…”. This may diminish participants’ recall bias usually evident in self-reports. Finally, the survey and trace data are enriched using song-level meta-data obtained via the Spotify API. Thus, our study combines linkage design traditions from survey methodology (Stier et al., 2020), which focuses on linking trace and survey data, and from communication research, which combines media use and media content data (de Vreese et al., 2017). Our study extends comparisons of self-report and tracking data to new domains, e.g., entertainment research and perceptions of algorithmic curation, and allows to test established entertainment theories as within-person phenomena.
We discuss general and specific challenges inherent in our approach that impact data reliability and validity: insufficient API documentation, technical restrictions, bugs, missing data and linkage errors. We also discuss survey design and computational workarounds to balance usability and GDPR compliant data protection.
Dr Alexandru Cernat (University of Manchester) - Presenting Author
Dr Florian Keusch (University of Mannheim)
Dr Ruben Bach (University of Mannheim)
Dr Paulina Pankowska (Utrecht University)
Digital trace data are receiving increased attention as a potential way to capture human behavior. Nevertheless, this type of data is far from perfect and may not always provide better data compared to traditional social surveys. In this study we use an experimental design in which we collected five topics relating to the use of mobile phones using five methods, three different survey scales and two measures from digital trace data. We show that surveys and digital trace data measures have very low correlation with each other. We also show that all measures are far from perfect and, while digital trace data appears to have often better quality compared to surveys, that is not always the case. Finally, we find that the duration measures both in surveys and digital trace data have the best quality out of the methods we compared.
Mr Julian Kohne (GESIS - Leibniz Institute for Social Sciences; Ulm University) - Presenting Author
Professor Christian Montag (Ulm University)
In this presentation, we present the ChatDashboard framework as an infrastructure to collect, process, and link donated WhatsApp chatlog data from consenting research participants. The framework consists of the ChatDashboard R-shiny webapp for uploading, reviewing, and securely donating WhatsApp chatlogs, the WhatsR R-package as a backend for parsing and preprocessing donated WhatsApp chatlogs, and an automated testing script for testing the setup of the framework. With ChatDashboard, researchers can set up their own data donation pipelines to collect transparently donated WhatsApp chatlog data from their participants and link them to survey responses. It thus enables social scientists to retrospectively collect highly granular data on interpersonal interactions and communication without the need to build their own tools. We briefly discuss the advantages and challenges of working with donated WhatsApp chatlogs and provide a detailed overview of how these features guided the design of the ChatDashboard framework. In addition we provide a detailed explanation for how researchers can set up their own data donation pipelines and discuss several important concerns with respect to ethical questions, informed consent, anonymization, and research data management.