ESRA 2025 Preliminary Program

All time references are in CEST

Multi-source data for labour statistics 2
Session Organisers	Professor Roberta Varriale (Sapienza University of Rome) Professor Dimitris Pavlopoulos (Vrije Universiteit Amsterdam (VU))
Time	Thursday 17 July, 13:30 - 15:00
Room	Ruppert 114

In recent years, labour market researchers and National Statistical Institutes involved in producing labour market statistics have increasingly adopted the use of multiple data sources. In addition to survey data, representing a traditional and primary source of information, register data from various sources has become extensively utilized in labour market research and, to some extent, in the production of official statistics.
A novel and promising development in this field is the production of multi-source statistics, achieved by linking information from various independent data sources. This approach offers numerous opportunities, as it combines the strengths of different frameworks: the extensive and detailed objective information provided by register data and the substantively valuable subjective insights obtained through surveys.
From a methodological perspective, linking multiple data sources presents the opportunity to address several aspects of data quality. By drawing on multiple sources of information, it is possible to enhance final estimates both in terms of content and accuracy. Nevertheless, this approach also introduces new methodological challenges.
In this session, we aim to present a range of statistical methods for handling multi-source data, accompanied by examples of results obtained using these methods to demonstrate the advantages of multi-source statistics. In particular, we will highlight the use of latent variable models, which leverage the simultaneous availability of information from multiple sources. This approach offers the benefit of accounting for potential measurement errors in each individual source. The application context for these methods is labour statistics.

Keywords: multi-source data, latent variable models, labour statistics

Papers

Revealing Tax-benefit Social Preferences in Croatia: Considering the Impact of Direct and Indirect Taxes

Mr Marko Ledic (EIZG) - Presenting Author
Mr Ivica Rubil (EIZG)

This paper employs the inverse-optimal approach from Saez’s (2002) optimal income tax model to derive the implicit marginal social welfare weights of single-earner households in Croatia. We compare tax-benefit revealed social preferences in different settings, depending on whether labor supply elasticities and the inverse-optimal tax model consider only direct taxes or both direct and indirect taxes or the combination of the two. Considering both direct and indirect taxes is crucial, as they can impact choices between leisure and consumption, ultimately influencing the total effective tax burden. We obtained the income distribution from the Croatian component of the European Union Statistics on Income and Living Conditions (EU-SILC) 2018 prepared as the input data for the EU tax-benefit microsimulation model (EUROMOD). Since the EU-SILC does not contain data on expenditures we have matched the data from the Household Budget Survey (HBS) to the EU-SILC using the Predictive Mean Matching imputation method. Net direct and indirect taxes are calculated using EUROMOD and an indirect tax microsimulation model, respectively, while behavioral responses at the intensive and extensive margin are estimated using a static discrete-choice labor supply model for Croatia. As the top incomes tend to be under-represented in survey data, making survey income distributions unrepresentative of the true ones, we have used the tax records data from the Croatian Tax Administration to correct the EU-SILC data. We find that the tax-benefit system in 2017 is shown to be optimal only if the government assigned a much higher welfare weight for the workless poor than the working poor. This holds true when considering only direct taxes and is further strengthened when both direct and indirect taxes are considered.

Using Machine Learning for Improving Nonresponse Adjustment in the Current Population Survey

Dr Emanuel Ben-David (US Census Bureau) - Presenting Author

The response rates to the Current Population Survey (CPS) have declined recently, raising concerns about potential bias in key population labor statistics due to nonresponse. In this paper, we discuss using administrative data to adjust the weights for nonresponse while keeping the calibration of population estimates unchanged. This involves linking the administrative data to responses and non-responding households in surveys. Once linked, we can use this data to adjust the weights for respondents to account for differential nonresponse rates among different subpopulations. In this paper, we propose two main aspects. First, we aim to enhance nonresponse adjustment using more advanced machine learning models. Second, we aim to address potential errors in the linkage process, which can significantly impact the performance of models used for nonresponse adjustments.

Improving the Kazakhstani Labour Force Survey for better targeting the NEET youth (not in employment, education or training)

Ms Dinara Alimkhanova (Nazarbayev University) - Presenting Author

Globally, there are only few indicators capturing the vulnerability of youth. One of them is a NEET (not in employment, education or training) indicator which is tracked through the Labour Force Survey worldwide. Today, one-fifth of individuals aged 15–24 belong to a NEET population. Therefore, this indicator has become an increasingly important phenomenon, especially for policy-makers concerning significant concepts such as social exclusion and inequality, gender, or poverty among youth.
While developed countries have made far greater improvements in survey questions and adapted it to their local needs, transitioning societies including Kazakhstan still need to enrich it by adding significant (sub)questions to better capture not only vulnerable youth but overall sampling. Therefore, this paper seeks to contribute to the improvement of the Labour Force Survey in Kazakhstan, which eventually can provide valuable suggestions for many other developing countries. Particularly, the study finds that the survey could be improved by adding important questions including socio-economic status, ethnicity, duration of health issues (short-, or long-term), voluntary disengagement from the labour force or education. Such improvement will contribute to identify nuanced factors associated with NEET youth further allowing designing more targeted preventive and intervening policy strategies.

A multiple-group hidden Markov model for multi-source data. Cross-country differences in employment mobility in the presence of measurement error

Dr Roberta Varriale (Sapienza University )
Dr Mauricio Garnier-Villarreal (Vrije Universiteit Amsterdam)
Dr Dimitris Pavlopoulos (Vrije Universiteit Amsterdam) - Presenting Author
Dr Danila Filipponi (Italian National Institute of Statistics)

In this paper, we study whether measurement error in survey and administrative data biases cross-country differences in employment mobility. For this purpose, we develop a multigroup hidden Markov model and apply it on linked data from the Labour Force Survey and administrative sources from the Netherlands and Italy and for the years 2017-2019. The measurement error correction we apply with our model reconciles differences between data sources and shows that cross-country differences in employment mobility are smaller than originally thought. Error-corrected estimates indicate that mobility from temporary to permanent employment has become, over time, larger in Italy than in the Netherlands, while mobility from non-employment to temporary employment has steadily been higher in the Netherlands than in Italy. The paper illustrates the value of using multiple data sources to produce reliable estimates on key socioeconomic indicators.

Hidden Markov models: accuracy across structure missing data designs

Dr Mauricio Garnier-Villarreal (Vrije Universiteit Amsterdam)
Dr Roberta Varriale (Sapienza Università di Roma)
Dr Danila Filipponi (Istituto nazionale di statistica) - Presenting Author

Large scale data sets are helpful to generate representative national statistics. But no data set is free from measurement error. Latent variable methods can be use to correct for some forms of measurement error, such as Hidden Markov Models (HMM). A way to do this is by adding multiple indicators, like one from the register and another from a survey. A survey commonly use is the Labour Force Survey (LFS), but this type of survey presents missing data issues. As subjects are not included at every time point, presenting structured missing data design. In this research we test accuracy and stability of HMM with structure missing data like in the LFS with a simulation study. The simulation conditions are inspired by the use of register and LFS data IStat (Istituto nazionale di statistica) for the evaluation of Work status (Employed/Unemployed). We test the simulation across the following data conditions: missing data type (MCAR, and structured), proportion of missing data (from 0 to 0.8), sample size (from 500 to 5000), and item quality. Item quality has four categories, across good/bad items and which item presents the missing data. With this study we will be able to evaluate the predictive accuracy of HMM for categorical variables measured over time with realistic missing data structures. Providing applied researchers with guidelines about the proper use of HMM and when it will tend to present higher classification error