ESRA logo

ESRA 2025 sessions by theme

Back to Overview of Sessions

Synthetic Data Generation and Imputation with LLMs

Coordinator 1Dr Anna-Carolina Haensch (LMU Munich)
Coordinator 2Professor Frauke Kreuter (LMU Munich)

Session Details

The use of large language models (LLMs) for generating synthetic data and performing data imputation has become increasingly prominent. The created data has been used for a variety of applications, from training machine learning models to filling gaps in incomplete datasets. Generating LLM synthetic data usually involves simple prompting techniques, often using so-called personas, but newer approaches now allow for more sophisticated methods such as fine-tuning LLM models to specific tasks such as synthesis and imputation. This session aims to bring together researchers and practitioners from fields such as data science, NLP and computer science to explore these advancements. We will discuss how to evaluate the quality of synthetic data and examine the effectiveness of various methods for generating and using it. Submissions are encouraged that cover topics such as:

- Evaluation techniques and frameworks for synthetic data quality
- Advances in imputation using LLMs
- Fine-tuning LLMs for specific data generation tasks
- Case studies demonstrating the application of LLM synthetic data in research or industry, especially for hard-to-reach population
- Methods for generating synthetic data with large language models