All time references are in CEST
Showcasing software tools and Infrastructures for the collection and integration of new data sources |
|
Session Organisers | Dr Angelica Maria Maineri (ODISSEI | Erasmus University Rotterdam) Dr Kasia Karpinska (ODISSEI | Erasmus University Rotterdam) Dr Laura Boeschoten (Utrecht University) Dr Bella Struminskaya (Utrecht University) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
The rapid emergence of new data sources, such as the collection of digital traces through data donation, mobile apps, sensors, or wearables has opened unprecedented opportunities for researchers to collect rich and timely data. However, collecting such data while controlling for quality in terms of measurement and representation poses significant challenges, calling for advanced software tools and research infrastructures.
This session focuses on presenting and demonstrating the latest software tools and infrastructures that are transforming the way data is collected and utilized. The session consists of hands-on demonstrations and presentations of platforms and technologies that are at the forefront of innovation in this field.
Key topics to be covered include:
-Innovative Data Collection Tools: Presentations of cutting-edge solutions designed for gathering data from new and diverse sources, e.g., apps, web scraping tools, digital trace data donation and browser plugins.These tools are showcased not only for their ability to capture rich, real-time data, but also to discuss their usability and user-friendliness for both researchers and participants, as well as to reflect on data quality.
-Data Linkage, Integration, and Sharing Platforms: Demonstrations of advanced tools and infrastructures that facilitate both the effective linkage of newly collected data with existing datasets and the secure, efficient sharing of data among researchers and institutions. These platforms will showcase their capacity to integrate diverse data streams into comprehensive datasets while ensuring data privacy, security, and ethical compliance, thus supporting collaborative research efforts.
The target audience of this session are researchers, research engineers and software developers who are keen to explore and adopt innovative software solutions in their data collection, data linkage or data sharing workflows. Attendees will leave with a comprehensive understanding of the latest tools available, empowering them to enhance their research capabilities in the evolving data landscape
Keywords: digital traces, software, infrastructure, data quality, data linkage, data integration, innovative data collection
Dr Laura Boeschoten (Utrecht University)
Dr Niek de Schipper (University of Amsterdam) - Presenting Author
Professor Daniel Oberski (Utrecht University)
The process of data donation involves participants sharing their digital trace data for academic research. In the data donation framework by Boeschoten, Ausloos, et al. (2022), participants request their Data Download Package (DDP) from platforms such as Google or Meta. Participants then download this data onto their personal devices, where they process it locally to extract only the data points relevant to the research project. After reviewing the extracted data, participants provide informed consent to share the processed data with researchers.
However, this process poses challenges, particularly regarding the handling of privacy-sensitive data. Despite local processing and informed consent, even the transfer of processed data to researchers could introduce potential risks to data security and privacy, especially when the processed data is highly sensitive. To address this limitation, we propose a novel approach to data donation—local model training on participants’ devices—which eliminates the need for direct data transfer. Instead of donating data, participants train models locally using their personal data and share only the resulting model parameters. This ensures that sensitive data remains private while still enabling researchers to derive meaningful insights using the estimated model parameters.
We extend the framework (Boeschoten, Ausloos, et al., 2022) by incorporating this local modeling approach. We demonstrate this extension using Instagram data from participants in the Netherlands. As a case study, we apply Latent Dirichlet Allocation (LDA) to aggregated Instagram data from participants, which results in the clustering of the top 1,000 Instagram accounts in the Netherlands into topics. We compare the local modeling approach to the same analysis performed on the full dataset.
Our approach enhances GDPR compliance, reduces data security risks, and builds participants’ trust. While it limits researchers’ ability to perform direct data quality checks, it represents a promising step toward privacy-preserving academic research.
Dr Laura Boeschoten (Utrecht University) - Presenting Author
Professor Theo Araujo (University of Amsterdam)
Dr Niek De Schipper (Utrecht University)
Dr Bella Struminskaya (Utrecht University)
Dr Kasper Welbers (VU University)
Dr Heleen Janssen (University of Amsterdam)
With data donation, we leverage the GDPRs right to data access and data portability. More specifically, we build on the idea that researchers can ask their participants to request their own digital traces at a platform of interest and share these for research purposes.
In the last years, we have developed the Digital Data Donation Infrastructure (D3I). D3I facilitates that researchers can set up their own data donation study. In practice, this means that they can invite participants to donate data from a specific platform of interest, and they only collect the digital traces of that platform that they are specifically interested in for their specific study. This means that although data donation studies typically have a similar set-up, they always have a unique element that depends on the specific research question.
The development of D3I was mainly funded by PDI-SSH, and is currently being funded by multiple different other funding resources. The development of the software aspect of the infrastructure has for an important part been done in co-creation with Eyra, a software company that focuses on sustainable software solutions for research purposes.
In this paper, we reflect on the development of this infrastructure. We discuss what we think are important aspects to consider when setting up research infrastructure, and what lessons did we learn. Topics we cover are for example:
- The general usability of infrastructure, facilitating different kinds of users, accessible code and documentation;
- Facilitating different users and communication towards them.
- Scalable versus non-scalable elements of research infrastructure.
- Considerations for making infrastructure available for researchers to use.
- Aspects to consider when using the infrastructure at different institutions, such as user agreements and security requirements.
- Long term maintenance of the infrastructure.
Dr Bella Struminskaya (Utrecht University) - Presenting Author
Mr Thijs Carriere (Utrecht University)
Dr Niek de Schipper (Utrecht University)
Dr Laura Boeschoten (Utrecht University)
The data from smartphone sensors and wearables have the potential to transform social and behavioral science research by ensuring in-the-moment, longitudinal, rich, precise, and scalable data collection. Researchers are increasingly taking advantage of such technologies using smartphone research apps with a wide range of sensing functionalities, including ecological momentary assessment, geolocation sensing, physical activity sensing, app usage, geofencing, among others. Numerous apps and platforms exist that researchers can use for data collection. They vary in functionalities, needed IT know-how, and possibilities for re-use. In this presentation, we provide a comprehensive review of existing apps and platforms used for app-based data collection in social and behavioral sciences and official statistics and discuss methodological, operational and software-development considerations for mobile sensing research infrastructures. We describe best practices and introduce an open-source smartphone app research infrastructure under development, illustrating aspects of re-usable sustainable app infrastructures that are able to be integrated into national and international social science research landscape. We focus on integrating research-based methodological aspects of privacy protection and participant engagement in software (i.e., research app) development, more general aspects of maintenance, integration with other parts of the research landscape, sustainable funding, as well as legal and ethical considerations when creating socio-technological systems for data collection.
Dr Adrienne Mendrik (Eyra) - Presenting Author
Mr Melle Lieuwes (Eyra)
Mr Rowdy van Looy (Eyra)
The rapid emergence of diverse data sources presents researchers with new opportunities and challenges in data collection. The Next platform, an open-source web platform developed by Eyra in collaboration with researchers, offers a sustainable solution for integrating innovative data collection methods. The platform enables researchers to gather data while addressing usability, data privacy, and sustainability.
For instance, the data donation software service on the Next platform facilitates the collection of digital trace data in a privacy-preserving manner. Using browser-based technologies like Pyodide, the service processes data directly on participants' devices, ensuring privacy-preserving processing of data (Port program). This service exemplifies the platform’s commitment to ethical and secure data handling while promoting transparency and adaptability through open-source Python-based workflows.
The Next platform’s modular architecture allows for the seamless integration of additional data collection methods, data analytics platforms, or panels, such as the LISS panel. Reusable modules shared within the platform's open-source codebase ensure sustainability and resource efficiency, as funding and development efforts for one service benefit the entire ecosystem.
This presentation will demonstrate how the Next platform’s innovative infrastructure can support the collection and integration of diverse data sources. By leveraging open-source principles and modular design, the Next platform provides a scalable and sustainable approach to modern data collection challenges, inviting feedback on future developments and integrations.
Mr Philipp Kemper (University Duisburg-Essen) - Presenting Author
Video games have become a central part of contemporary popular culture. However, little research exists on the utilization of interactive video games for answering questions relevant for social scientists. This paper explores how video games can be designed and used for collecting behavioral data of participants in experimentally manipulated settings. Using RPG Maker, a commercial tool for creating video games, I designed a game in which participants control a character, talk to non-player characters, spend money, and make abstract decisions (e.g., voting). As a substantive test case, this paper answers the question how the trade-off between taxation and the quality of public services affect peoples' political solidarity. The video game was embedded in an online survey. Participants (N = 300, random sample of the German population) played as citizens of a fictional state, conducted everyday activities and thereby consumed various private and public goods. Within their fictional stay, they experienced changing levels of taxation and public service quality, resulting in more/less resources available for consuming private goods in exchange for a worse/better access to public services. Participants could then vote for a trade-off between taxes and public spending. Substantively, the preliminary results suggest a thermostatic mechanism, where people adjust their political solidarity according to their consumption quality of public and private goods. Methodologically, this study shows that using video games for data collections can generate high-quality behavioral data by making participants motivated, attentive, emotionally engaged, and giving them a strong agency.