ESRA logo

ESRA 2025 sessions by theme

Back to Overview of Sessions

alternative data sources and pseudo-inference in official statistics

Coordinator 1Dr Federico Crescenzi (University of Tuscia)
Coordinator 2Professor Tiziana Laureti (University of Tuscia)

Session Details

Official statistics are released based on official survey data collected by National Statistical Offices that often involve probabilistic samples. However, the process of releasing statistics on socio-economic phenomena such as poverty and living conditions or price statistics (e.g. inflation) may suffer from scarse timeliness, or it may be of not enough accuracy to provide reliable estimates for small areas. In the field of measuring inflation, for example, alternative data sources such as data from web-scraping have proven to be valuable tools to provide high frequency estimates of the inflation. In recent years, web-scraped data has become an important source for the compilation of consumer price indices (CPIs) in many countries, driven by the growing prevalence of online sales and advances in the methods used to process such data (European Commission, 2020). On the other hand, remote sensing data have provided evidences in predicting poverty at small areas.

Nonetheless, web-scraping and other similar sources of data are collected based on non-probabilistic surveys, so that they can suffer from selection-biases and lack of consistent estimation of sampling errors. Moreover, this data opens the possibility to more nuances statistical modelling techniques based on statistical/machine learning.
The goal of this session is to show the advantages that alternative data sources can offer in the field of official statistics and to address the drawbacks that come from non probabilistic samples. The session welcomes contributions of the kind

- small area estimation of official statistics based on alternative data sources (e.g. remote sensing, web-scraping)
- web scraping and crowdsourcing data to release/integrate official statistics
- non-probabilistic inference in official statistics
- survey to survey imputation techniques using alternative data sources and machine learning techniques
- statistical/machine learning techniques in official statistics