All time references are in CEST
Open Science in Survey Research |
|
Session Organiser | Dr Angelica Maria Maineri (ODISSEI | Erasmus University Rotterdam) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
Open Science consists of a multifaceted set of principles and practices to increase the accessibility, transparency and impact of scientific research.The transition to Open Science is promoted by many high-level institutions, such as UNESCO and the European Commission, but also (inter)national funders. While Open Science is thought to accelerate scientific discovery and to increase the societal impact of research, there are still significant barriers in the adoption of open practices due to various factors, including a lack of knowledge and/or resources, concerns over privacy and sensitivity of data, and worries about scooping and intellectual property.
Following the global trend towards Open Science, this session invites contributions that showcase how Open Science principles and practices have been embedded into survey practice and survey research, and review their strengths and weaknesses. Contributions that address the barriers to Open Science for survey research are also welcome. Topics refer to (but are not limited to) different pillars of Open Science, such as:
- Open source software
- FAIR data
- Open engagement of social actors and/or citizen science
- Openness to diversity of knowledge (e.g. to indigenous knowledge systems)
- Open educational resources
- Open science infrastructures
Keywords: Open science, FAIR, citizen science, open source, open resources
Mr Anthony Damico (Independent Consultant) - Presenting Author
Governments, NGOs, and other research institutes spend billions of dollars each year collecting demographic, economic, and health information about their populations. These efforts form the basis of many official reports, academic journal articles, and public health surveillance systems, each of which motivate public policy or inform the public to varying degrees. Though dependent on the sensitivity of the topic, these sponsoring organizations often publish household-level, person-level, or company-level datasets alongside their final, summary report. This response-level data (commonly known as microdata) allows external researchers both to reproduce the original findings and also to more deeply focus on segments of the population perhaps not discussed in the data products released by the authors of the original investigation. For example, the Census Bureau publishes an annual report, "Income and Poverty in the United States" with a series of tables, and also a database with one record per individual within each sampled households. While the Bureau helpfully provides many different cross-tabulations of their results, an external researcher might find utility in this dataset by investigating other groups (such as different age cutoffs or dollar thresholds), and so the public microdata files allow continued research where it otherwise might end. The website http://asdfree.com/ offers obsessively-detailed instructions to analyze a wide variety of publicly-available datasets using the R language. This resource generally contains three core components, each with step-by-step instructions: (1) Download automation or data acquisition; (2) Helpfully-noted analysis examples; (3) Replication of published estimates to prove correct methodology.
Dr Paulina Pankowska (Utrecht University) - Presenting Author
Dr Adrienne Mendrik (Eyra)
Dr Thomas Emery (Odissei, Erasmus University Rotterdam)
Dr Javier Garcia Bernardo (Utrecht University)
Ms Elizaveta Sivak ( University of Groningen)
Dr Geert Stulp ( University of Groningen)
Social scientists aim to create explanations of the world. For each social phenomenon, scientists have proposed a myriad of theories to explain its working mechanisms. Traditionally, these theories are tested by generating hypotheses, translating them into a statistical model, and assessing the significance of the model’s coefficients. Such an approach, however, often leads to the specification of a large number of models, all asserting that they capture the same theory. As things currently stand, there is no framework that allows for a comparison of these models. We argue that benchmarks (i.e. a standardized validation framework that allows for a direct comparison of the accuracy of various models that predict the same outcome) can serve as such a frame of reference. The use of benchmarks can help to determine which models fit better in a specific context.
We demonstrate the potential of benchmarking through our experience in co-organizing two pilot benchmark challenges during the ODISSEI-SICSS summer school as well as the Predicting Fertility data challenge (PreFer). The aim of the latter was to measure current predictability of fertility outcomes in the Netherlands to advance our understanding of fertility. We use these experiences to provide recommendations for the utilization of benchmark challenges in the social sciences, in terms of data, design, and infrastructure, that need to be met to fully realize the potential of benchmarking. In doing so we emphasize the need for benchmarks to act as engines of standardization and replication in the social sciences and establish new norms and best practices that can then be adopted by the wider community. To put this in the context of contemporary science policy, we argue that benchmark challenges should promote the use of FAIR data and software. That is, data and software which meet the principles of findability, accessibility,
Ms Hanne Oberman (Utrecht University) - Presenting Author
Computational reproducibility—the ability to reproduce results from a study using the original data and analysis code—is a cornerstone of Open Science. It ensures transparency, fosters trust, and enhances the reusability of scientific research. Even with open data and code, many statistical studies are regrettably not reproducible yet. This presentation explores how to teach computational reproducibility to (future) statisticians, focusing on domain-agnostic principles and practical tools.
At Utrecht University, we teach a reproducibility-focused course in the research master’s program 'Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences'. The course introduces students to tools and workflows for creating publication-ready research compendia, incorporating (among other things) Quarto markdown, version control with Git, and reproducible R environments using {renv}. Many of the tools taught in this course are also applicable to the work of survey methodologists. For example, R code may be shared as an executable research compendium, R package, or Shiny app.
The key question in this presentation is: How can we promote the adaption of tools for computational reproducibility among statisticians?
Dr Adrienne Mendrik (Eyra) - Presenting Author
Mr Melle Lieuwes (Eyra)
The open source Next platform, developed by Eyra in collaboration with researchers, exemplifies an Open Science infrastructure designed to enhance accessibility, transparency, and efficiency in research. Modeled as a web-based operating system, the platform integrates diverse software services—such as data donation, benchmark challenges, and participant recruitment—that function as "apps" on the platform. These services share reusable modules within the open-source Next mono codebase (https://github.com/eyra/mono), ensuring a sustainable and collaborative software ecosystem.
As an open source initiative, the Next platform actively promotes the development of open source software within the research community. For instance, the data donation software service enables researchers to collect digital trace data from participants in a privacy-preserving manner. Using Pyodide, the service processes data directly in participants’ browsers, enabling privacy-preserving data processing. Researchers or research engineers can develop these data processing flows themselves using open source Python scripts, fostering transparency and adaptability. These flows are standardized by forking the Next platform's third-party integration repository, Feldspar, available on GitHub (https://github.com/eyra/feldspar). Similarly, the benchmark challenge service provides a framework for researchers to evaluate methods in a standardized and replicable manner. Method submissions are based on an open-source template repository, such as the PreFer challenge template on GitHub (https://github.com/eyra/fertility-prediction-challenge). Both services exemplify the platform’s commitment to open-source principles by sharing code, fostering collaboration, and enabling reusable workflows.
This presentation will explore the platform’s role as an Open Science infrastructure, highlighting its contributions to reproducibility, interdisciplinary collaboration, and accessibility. Future development plans will also be discussed, including the introduction of additional services, with an open invitation to researchers to provide input.