Synthetic Data Generation and Imputation with LLMs |
|
Coordinator 1 | Dr Anna-Carolina Haensch (LMU Munich) |
Coordinator 2 | Professor Frauke Kreuter (LMU Munich) |
The use of large language models (LLMs) for generating synthetic data and performing data imputation has become increasingly prominent. The created data has been used for a variety of applications, from training machine learning models to filling gaps in incomplete datasets. Generating LLM synthetic data usually involves simple prompting techniques, often using so-called personas, but newer approaches now allow for more sophisticated methods such as fine-tuning LLM models to specific tasks such as synthesis and imputation. This session aims to bring together researchers and practitioners from fields such as data science, NLP and computer science to explore these advancements. We will discuss how to evaluate the quality of synthetic data and examine the effectiveness of various methods for generating and using it. Submissions are encouraged that cover topics such as:
- Evaluation techniques and frameworks for synthetic data quality
- Advances in imputation using LLMs
- Fine-tuning LLMs for specific data generation tasks
- Case studies demonstrating the application of LLM synthetic data in research or industry, especially for hard-to-reach population
- Methods for generating synthetic data with large language models