All time references are in CEST
Quality Assurance of Sensor Data in Survey Research |
|
Session Organisers | Dr Fiona Draxler (University of Mannheim) Dr Vanessa Lux (GESIS) Dr Yannik Peters (GESIS) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
As sensor data becomes increasingly integrated into survey research, ensuring its quality and reliability is critical. Sensor data, often collected through wearable devices, mobile apps, and environmental sensors, comes with the promise of highly granular and nonintrusive data collection in everyday life. However, due to its dynamic and context-dependent nature, the use of sensor data entails unique challenges to data quality. To harvest its potential for survey research, we need standards, frameworks, and tools to manage these challenges. The session aims to provide a thorough understanding of the methodological challenges and practical solutions in ensuring the quality of this new data type in the context of survey research. We invite contributions which discuss any aspect of quality assurance of sensor data when integrated into survey research. We specifically encourage contributions which explore comprehensive frameworks, cutting-edge tools, and best practices designed to maintain the integrity and usability of sensor data at all stages of the data life cycle. Contributions may cover but are not limited to:
• Standards and Frameworks for Quality Assurance: Contributions that focus on the applicability of new and existing data quality standards and frameworks to assess sensor data, emphasizing criteria for evaluating reliability, validity, and representativeness in diverse sensor data applications.
• Tools and Platforms for Data Validation: Contributions that present individual tools and technologies that evaluate sensor data quality, both automated and manual evaluation techniques, the use of machine learning and AI-driven approaches, as well as open-source platforms (e.g., KODAQS toolbox).
• Best Practices and Case Studies: Contributions that showcase best practice examples and case studies that demonstrate the successful integration of sensor data into survey research yet address the specific challenges encountered and the solutions applied.
• Didactics of Data Quality Issues: Contributions that discuss approaches to teaching and promoting data quality assurance
Keywords: Sensors, data quality
Dr Vanessa Lux (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Mr Lukas Birkenmaier (GESIS - Leibniz Institute for the Social Sciences)
Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences)
Dr Jessica Daikeler (GESIS - Leibniz Institute for the Social Sciences)
Dr Fiona Draxler (University of Mannheim)
Ms Judith Gilsbach (GESIS - Leibniz Institute for the Social Sciences)
Mr Julian Kohne (GESIS - Leibniz Institute for the Social Sciences)
Dr Frank Mangold (GESIS - Leibniz Institute for the Social Sciences)
Professor Katrin Weller (GESIS - Leibniz Institute for the Social Sciences)
Dr Mareike Wieland (GESIS - Leibniz Institute for the Social Sciences)
The widespread availability and easy accessibility of sensor data from smartphones or mobile devices have led to an interest in integrating these data with traditional survey data to enhance ecological validity and address social desirability and recollection biases in self-reports. However, with their increasing use, concerns regarding the accuracy and reliability of the sensor data have emerged. In general, error sources and biases of sensor data are very sensor-specific, which poses a challenge to social science researchers often lacking the necessary technical expertise. In addition, technical-oriented data quality assessments focusing on basic standards rarely include further quality criteria relevant to survey research and social science studies (e.g., construct validity, representation).
Drawing from a literature review and expert evaluations, we developed a general error framework for assessing sensor data quality in social science research that is compatible with error frameworks used in survey research (e.g., TSE). The proposed error framework outlines sources for measurement error and representation bias along the full research cycle (planning, data collection and analysis, archiving, and sharing). In the paper presentation, we demonstrate the application of this framework to sensor data assessed with mobile devices in the context of survey research. We explicitly outline potential error sources resulting from the multilayered character of sensor data, the spatial mobility of the data collection, and the specific interplay between the researcher, study participants, and the device. For example, representation bias can stem from self-selection effects of those participating in sensor data collections, variances in technical competency, compliance issues related to device use patterns, or device-specific technical failures within sub-groups. We also discuss how combining sensor and survey data can improve data quality assessment for both types and how the proposed error framework can support researchers in enhancing their assessments and reporting.
Dr Patricia Iglesias (Centre d'Estudis Demogràfics) - Presenting Author
Despite growing interest in collecting photos within online surveys, little is known about the quality of visual data compared to data obtained through conventional requests. This presentation aims to fill this gap and provide quality assessment indicators for information collected through both formats: photos and conventional questions.
An online survey targeting parents of children attending primary school in Spain was conducted through the Netquest opt-in panel in 2023. The survey gathered information about books in respondents’ homes using both formats. 661 photos were collected among 215 respondents; photos were manually classified by two researchers, following detailed guidelines to classify the relevant information.
To evaluate quality, first, a review of previous research using conventional questions, photos, and other emerging data types was conducted, to identify indicators suitable to evaluate the quality of the information about books at home collected through conventional and image-based formats. Second, most of these indicators were estimated.
In this presentation, I will present a) the suitable indicators to assess and compare conventional and image-based questions, and b) the quality of both formats, when using the proposed indicators.
Indicators were identified for the conventional format, the image-based one, and both, which allowed for a direct comparison between formats. Important measurement errors in conventional questions were found. Photos submitted by respondents are generally in line and can be classified. However, specific information of interest about the books, such as the intended audience or languages, is often difficult to extract from photos. When comparing quality, conventional answers provide more information about the items asked than photos, but photos have the potential to provide additional insights, such as book titles.
Overall, while collecting and analyzing photos sent through surveys presents challenges, their integration into surveys offers unique opportunities to enrich data collection methods.
Dr Peter Lugtig (Utrecht University) - Presenting Author
Dr Bella Struminskaya (Utrecht University)
Ms Daniele McCool (Utrecht University)
Professor Florian Keusch (University of Mannheim)
Dr Maren Fritz (university of Mannheim)
Dr Fabrizio de Fausti (Istat)
Dr Claudia de Vitiis (Istat)
Dr Theun-Pieter van Tienoven (Vrije Universiteit Brussel)
Over the period 2023-2025 a large consortium of researchers from across Europe worked on a large project ' Smart Survey Implementation', which had the goal of establishing an infrastructure, legal basis, and methodology for doing the European Household Budget (HBS) and Time Use Survey (TUS) in a smart way. In the smart HBS, respondents take pictures of receipts they receive when buying products. In the mart TUS, GPS positioning is used to pre-populate a Time Use diary study. Apps, and microservices that process sensor data ar central to the project.
This presentation focuses on learnings from the project when it comes to methodology and data quality. We will report findings from several small-scale and large experiments conducted in Norway, Belgium, Germany, France. the Netherlands and Italy, that focused on 1) the succesful recruitment of respondents into smart surveys 2) the use of machine learning to process sensor data 3) interaction of respondents with (pre-processed sensor data), and 4) measurement effects that occur within smart surveys.
We will here concentrate on the effects on data quality. What particular recruitment method for a smart survey is successful in terms of achieving good response rates, and low selection bias? What errors are introduced in processing and aggregating smartphone sensor data, and how can these be decreased. How can respondents help to correct measurement errors, and encouraged to provide high-quality data?
In the presentation we will focus on errors of measurement and selection in the survey lifecycle, and conclude by presentation best-practices and open research questions when it comes to the quality of sensor data in smart surveys.