ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Multi-source data for labour statistics

Session Organisers Professor Roberta Varriale (Sapienza University of Rome)
Professor Dimitris Pavlopoulos (Vrije Universiteit Amsterdam (VU))
TimeThursday 17 July, 09:00 - 10:30
Room Ruppert 114

In recent years, labour market researchers and National Statistical Institutes involved in producing labour market statistics have increasingly adopted the use of multiple data sources. In addition to survey data, representing a traditional and primary source of information, register data from various sources has become extensively utilized in labour market research and, to some extent, in the production of official statistics.
A novel and promising development in this field is the production of multi-source statistics, achieved by linking information from various independent data sources. This approach offers numerous opportunities, as it combines the strengths of different frameworks: the extensive and detailed objective information provided by register data and the substantively valuable subjective insights obtained through surveys.
From a methodological perspective, linking multiple data sources presents the opportunity to address several aspects of data quality. By drawing on multiple sources of information, it is possible to enhance final estimates both in terms of content and accuracy. Nevertheless, this approach also introduces new methodological challenges.
In this session, we aim to present a range of statistical methods for handling multi-source data, accompanied by examples of results obtained using these methods to demonstrate the advantages of multi-source statistics. In particular, we will highlight the use of latent variable models, which leverage the simultaneous availability of information from multiple sources. This approach offers the benefit of accounting for potential measurement errors in each individual source. The application context for these methods is labour statistics.

Keywords: multi-source data, latent variable models, labour statistics

Papers

Modeling Total Error using Multi-source Data: A Simulation and an Application to the Italian Labor Market

Mr Santiago Gómez-Echeverry (Vrije Universiteit Amsterdam) - Presenting Author
Mrs Silvia Loriga (Istituto Nazionale di Statistica (ISTAT))
Mr Davide Di Laurea (Istituto Nazionale di Statistica (ISTAT))
Mr Arnout van Delden (Centraal Bureau voor de Statistiek (CBS))

The expansion of administrative and Big Data and the increase in the survey's non-responses have highlighted the relevance of assessing the quality of non-probability samples. To tackle this issue, people usually resort to the Total Error (TE) framework, which divides the error into a measurement and a representation component. Extensive literature focuses on measurement error, often using a combination of data from different sources to evaluate whether the observed variables adequately capture the concept intended to be measured. Another branch of the literature has centered on the representation error, assessing how respondents are selected in the sample, leading to systematic differences between the population and the observed units. However, research modeling both of these components simultaneously is still scant. In the present study, we address this gap by jointly modeling the measurement and the representation errors, combining recent advances in both areas. We conducted a simulation study to evaluate our TE model under different measurement and representation error specifications. Additionally, we performed a case study analysis using a combination of Italian administrative registers and the Labor Force Survey (LFS) to evaluate the TE in the income variable. Our preliminary results show that our model adequately captures the different error sources and provides a good strategy for assessing the TE when using a combination of probability and non-probability data.


Measurement of Poverty and Inequality with Publicly Available Microdata

Mr Anthony Damico (Independent Consultant) - Presenting Author

Governments, NGOs, and other research institutes spend billions of dollars each year collecting demographic, economic, and health information about their populations. These efforts form the basis of many official reports, academic journal articles, and public health surveillance systems, each of which motivate public policy or inform the public to varying degrees. Though dependent on the sensitivity of the topic, these sponsoring organizations often publish household-level, person-level, or company-level datasets alongside their final, summary report. This response-level data (commonly known as microdata) allows external researchers both to reproduce the original findings and also to more deeply focus on segments of the population perhaps not discussed in the data products released by the authors of the original investigation. For example, the Census Bureau publishes an annual report, "Income and Poverty in the United States" with a series of tables, and also a database with one record per individual within each sampled households. While the Bureau helpfully provides many different measures of income dispersion in their results, an external researcher might find utility in this dataset by investigating other measures of poverty or inequality (such as the laeken measures to make comparisons between the United States and the European Union), and so the public microdata files allow continued research where it otherwise might end. The website https://convey-r.org/ offers a wide range of poverty, inequality, and richness measures applicable to many publicly-available datasets using the R language. This textbook contains three core components, each with step-by-step instructions: (1) Data preparation of major economic wellbeing surveys from the United States and Brazil; (2) Poverty Indices; (3) Inequality Measurement.


Predicting Unit Nonresponse from Multiple Sources of Administrative Data with Parametric Regression and Machine Learning Methods

Dr Hafsteinn Einarsson (University of Iceland) - Presenting Author
Professor Joseph Sakshaug (Ludwig Maximilian University of Munich / University of Mannheim / Institute for Employment Research)

Register-based sample surveys, where data from administrative register systems are drawn to form sample frames, are used in many European countries. However, the range of register-based variables utilized in survey research is often limited to a few demographic characteristics. In recent years, the use of a wider range of register data for research purposes has become more commonplace, although the full potential of the register system remains untapped, particularly for national statistical institutes that can access numerous sources of register-based administrative data. Here, we examine how a greater choice of administrative register data variables can affect survey practice as it relates to labor force surveys, by exploring associations between individual level characteristics drawn from multiple administrative data sources and unit nonresponse in the Icelandic Labor Force Survey, a quarterly cross-sectional telephone survey. Specifically, we focus on whether prior wave information can be utilized to predict unit nonresponse prior to the onset of fieldwork and whether expanding the range of administrative variables improves prediction. Furthermore, we explore whether the choice of estimator affects accuracy by comparing parametric regression and machine learning methods. Our findings suggest that due to the strong association between immigration background and survey participation, most models show similar performance in terms of classifying respondents and nonrespondents. However, when comparing the goodness-of-fit across the full range of the response propensity distribution, we find that the combination of an extensive range of administrative variables and Random Forest models performs best in terms of predicting unit nonresponse. We explore the relative contribution of the predictor variables for ….We consider how these findings could affect survey practice in repeated labor force surveys, including how they may be used in informing adaptive survey designs.


Early Career Patterns: A Comparative Analysis of Education-to-Work Transitions in the Netherlands and Italy

Ms Silvia Loriga (Istat)
Ms Laura Eberlein (VU University Amsterdam) - Presenting Author

Youth employment is a topic of great interest in labor market analysis and is one of the areas where significant differences are observed among European countries. Specifically, Italy has one of the lowest youth employment rates, while the Netherlands boasts one of the highest. In addition to analyzing the proportion of young people employed at a given point in time, it is also interesting to observe transitions, in terms of entry into and exit from employment, as well as changes in the type of work.
This study aims to examine young people's entry into employment and the career paths during the early years following the completion of their studies, comparing the results for Italy and the Netherlands. These aspects are usually studied through ad hoc sample surveys on the career outcomes of young people who have completed their studies. Typically, the information collected from these surveys allows for the analysis of the employment status of young people a certain number of years after leaving the education system. However, it does not enable an analysis of all the transitions that occurred over time.
To carry out this study, we constructed a rich database by linking administrative information with data collected from the Labour Force Survey. The database is characterized by a longitudinal dimension, achieved by leveraging the ability to integrate administrative data over time and the longitudinal nature of the Labour Force Survey. The methodology that we used is a Mixture Hidden Markov Model, which provides a suitable framework for analyzing discrete-time longitudinal data with multiple life statuses and a large number of different transitions. Considering the characteristics of the work, different types of employment are identified and distinct patterns are recognized differing according to their upward and downward mobility.