ESRA 2025 Preliminary Glance Program
All time references are in CEST
Advances in statistical analysis and survey data analysis skills |
Session Organiser |
Dr Daniel Seddig (KFN)
|
Time | Wednesday 16 July, 11:00 - 12:30 |
Room |
Ruppert C - 0.23 |
-
Keywords: Statistical analysis, data analysis skills
Papers
A Comparative Analysis of the use of Linear Models in Quantitative Social Sciences
Dr Andres Castro Torres (Barcelona Supercomputing Center) - Presenting Author
Dr Aliakbar Akbaritabar (Max Planck Institute for Demographic Research)
The literature shows that more than half of quantitative research in social sciences has relied on linear modeling techniques. Linear models focus on measuring the isolated effects of independent variables on outcomes, which often implies an additive conceptualization of the social phenomena. This particular analytically-oriented framework is different from others that conceptualize the social world in terms of configurations and processes where social factors correlate, change, vary, and evolve in tandem. Configurational and processual analysis frameworks often rely on statistical methods other than linear models (e.g., geometric data analysis techniques). We contribute an empirical analysis of the frequency of use of linear models in quantitative research across eight different disciplines using the publicly available OpenAlex database which has the most comprehensive coverage in comparison to proprietary databases. We found that linear models are used prevalently from 50% to 70% in Economics, Geography, Psychology, Political Sciences, and Sociology, with most of them displaying the highest prevalence of linear models use in the 2010s. This prevalence is lower in Environmental Sciences and History (i.e., around 40%), and much higher in Medicine (i.e., above 80%). Based on our results, awareness should be raised to motivate a greater diversity in analysis frameworks, especially among the scientific communities in the social sciences.
Natural Effect Models for Causal Mediation Analysis Using Survey Data
Dr Lizbeth Burgos-Ochoa (Department of Methodology and Statistics, Tilburg University) - Presenting Author
Dr Katharina Loter (Department of Methodology and Statistics, Tilburg University)
Background: National health surveys are a vital data source for public health and epidemiology research. In recent years, there has been an increased interest in the use of health survey data for causal inference. The potential outcomes framework has provided conceptual clarity and methodological tools to define and compute causal estimands of interest, with mediated effects being a priority in causal research. However, guidance on estimating mediated effects using causal mediation analysis with survey data remains limited. Additionally, item non-response increases the complexity of such analyses. Estimation procedures based on natural effect models (NEMs) offer seamless integration of mediation analysis with approaches for handling missing data. In this work, we demonstrate two estimation methods for mediated effects using NEMs with survey data through a practical example, highlighting the challenges and potential solutions for survey-based research.
Methods: In this real-life example, we used data from the National Health and Nutrition Examination Survey (NHANES) to investigate the role of depressive symptoms as a mediator for the effect of sleep disorders on cardiovascular disease (CVD) in adults. We compared the use of two estimation methods, i.e., the NEMs weighting approach and the NEMs imputation-based approach in terms of point estimates, efficiency, and flexibility for dealing with item non-response.
Results: Our analysis, controllig for potential confounders due to non-random treatment assignment, presents point estimates and 95% confidence intervals for each NEMs approach. Depressive symptoms were found to mediate a modest portion of the effect of sleep disorders on CVD. Compared to the NEMs weighting approach, the imputation-based approach provides a more efficient and flexible framework to estimate mediated effects in this example.
Conclusion: This study demonstrates the application of natural effect models (NEMs) in survey-based mediation analysis, highlighting the advantages of an imputation-based approach.
East German Identity: Cross-validation of a Multidimensional Measurement Instrument
Ms Emma Roßbach (Chemnitz University of Technology) - Presenting Author
Professor Jochen Mayerl (Chemnitz University of Technology)
More than three decades after the reunification of Germany in 1989, East German identity remains a significant and complex phenomenon. Contrary to the assumption that differences between East and West Germany would diminish over time, recent studies show that East German identity continues to influence younger generations born after 1990 (Mau et al., 2024). This continued relevance underscores that, despite the passage of time, a complete rapprochement between East and West Germany has not been achieved. Instead, East German identity remains a distinctive and enduring element of the social landscape, characterised by collective experiences of disadvantage and generational dynamics. Importantly, these experiences do not manifest themselves uniformly; rather, East German identity proves to be a multidimensional construct of social diversity that can vary across different subgroups.
To explore this ongoing relevance, we developed and validated a new instrument to measure East German identity as a multidimensional construct. This scale accounts for both universal elements of East German identity and subgroup-specific variations, such as generational differences. We use multi-group confirmatory factor analysis to cross- validated the instrument. First, an online survey of 1,638 participants from a Saxony-wide access panel provided the foundation for the scale’s development. Second, a mixed mode random sample from the Chemnitz region (push-to-web and postal survey, n=600) was used for validation. The sample for the survey (4,779 people) was randomly selected from the residents' registration office of the city of Chemnitz and 20 neighbouring municipalities in Saxony, Germany. The findings reveal that while some dimensions of East German identity are shared across generations, others are more variable and tied to specific subgroup experiences. This study makes a methodological contribution by offering a validated instrument for measuring multidimensional East German identities.
Enhancing data literacy: the UK Data Service Data Skills Framework
Dr Vanessa Higgins (UK Data Service/University of Manchester)
Dr Sarah King-Hele (UK Data Service/University of Manchester) - Presenting Author
The presentation will introduce the UK Data Service Data Skills Framework for quantitative data skills training. The framework aims to establish a robust framework for developing essential data analysis skills for the social, economic and population sciences, focusing on large-scale survey data (as well as census, and macro-level aggregate data). It has been developed in the context of a data landscape undergoing unprecedented change at rapid pace. It emphasises continued development of traditional data skills for contemporary research needs, while recognising growing potential for integrating survey data with an expanding array of other sources increasingly accessible to social scientists, as well as promising opportunities presented by AI and machine learning for enhancing analysis.
The presentation will cover the background and development of the Framework, the methodology, the final content of the framework and the feedback we've had from the community. We will discuss how we are using it within the UK Data Service to aid our own gap analysis of our training programme, training development, and strategic thinking for the next five years, as well as how we are using the Framework to collaborate with other data services. We present a number of potential wider use cases for the framework below and hope it can help in the continued development of a relevant and efficient training ecosystem for data analysis across the social sciences. We envisage it as a live and evolving piece of work, and very much welcome discussion on the content, particularly from the ESRA international audience.
Estimating Preferences Over Data to Inform Statistical Disclosure Control Decisions
Dr Elan Segarra (U.S. Bureau of Labor Statistics) - Presenting Author
This project provides a framework and empirical example for the estimation of consumer demand for published statistics, and incorporation of these estimates into the process of statistical disclosure control (SDC). When implementing SDC methods, data providers are tasked with balancing the benefits of published statistics with the risks of re-identification of entities in the confidential micro-data. Typically, the benefit side of this calculation is reduced to maximizing the number of published statistics, as opposed to assessing which statistics might be more useful to downstream consumers. In the context of the cell suppression problem, data providers must choose complementary suppressions to protect against secondary disclosure attacks, and in the context of differential privacy data providers must choose how to allocate their privacy budgets across different sets of output. Incorporating data user valuations over potentially published statistics can help inform these decisions. Consumer demand for statistics is modeled using a discrete choice nested logit model where individual statistics can vary by characteristics such as their conditioning variables (e.g. labor data sliced by occupation versus industry). To illustrate its feasibility, the framework is applied to the Census of Fatal Occupational Injuries. Preferences are estimated with standard methodological approaches using page-view data of public CFOI webpages, and the parameter estimates are used to compute valuations which are leveraged in both cell suppression and differential privacy approaches to protecting CFOI tables.