All time references are in CEST
Linking survey data with digital trace data: error sources and best practices |
|
Session Organisers | Professor Mark Trappmann (Institute for Employment Research (IAB)) Dr Valerie Hase (Ludwig-Maximilians Universität München (LMU)) Professor Florian Keusch (Universität Mannheim) Professor Frauke Kreuter (Ludwig-Maximilians Universität München (LMU)) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
While survey data and digital trace data alone are each powerful and frequently used data sources in social science research, an even more compelling opportunity lies in the combination of the two . Digital trace data can be used to enrich existing survey data with nonreactive behavioral data of high velocity, precision and frequency. At the same time, the combination with survey data based on probability samples can increase the external validity and facilitate statistical inference of the digital trace data. Furthermore, self-reports add context to the digital trace data which may help us to better understand them and avoid misinterpretations.
In this session, we are looking for contributions that investigate any combination – both in the collection of these data sources and in their combined analysis – of survey data with digital trace data. These can, for example, result from the request for survey participants to install research apps on their smartphones or to donate their data from their social media profiles or other data sources . We are mainly interested in research that investigates error sources like coverage error, nonresponse error and measurement error in studies using such combined data. Furthermore, we are interested in studies that aim at improving measurement quality by combining these two sources. Studies that help identify best practices (e.g., how to optimize consent or donation rates, how to statistically address resulting errors) are also in the scope of the session.
Keywords: digital trace data, data donation, total survey error
Professor Florian Keusch (University of Mannheim) - Presenting Author
Mr Frieder Rodewald (University of Mannheim)
Data protection regulations in the EU, Brasil, and California give users the right to access the information online platforms hold about them. Data donation studies capitalize on this legal requirement by asking web survey respondents to donate their data at the end of the survey. This sequential approach assumes that respondents' prior engagement with the survey enhances their willingness to donate data. However, this approach often results in modest donation rates. An alternative approach is to directly frame the study in the context of a data donation task, thus increasing the commitment to provide additional data at the end of the survey.
In this study, we conduct a 2x3 experiment involving over 2,000 participants from a German online access panel. Panel members are invited to a study framed (1) as a web survey introducing data donation only once respondents completed the questionnaire or (2) as a data donation study from the beginning. When asked for data donation, we also randomly vary the appeal (1) emphasizing participants' ability to quantify their online platform behavior, (2) emphasizing participants' ability to learn what online platforms know about them, and (3) no appeal.
We ask for data donations from YouTube, Instagram, and LinkedIn. Our hypotheses posit that while initial study participation rates will be higher for the survey framing than for the data donation framing, participants' willingness to donate will be significantly higher for those in the data donation framing. Further emphasizing the personal benefits of the data donation should increase the willingness to donate compared to having no such emphasis, especially when presented at the study's start rather than the end. We also examine how the framing influences sample composition, specifically regarding technical skills, self-monitoring openness, frequency of platform usage, privacy concerns, and trust in a platform.
Mr Carlos Ochoa (Netquest / Universitat Pompeu Fabra (RECSM)) - Presenting Author
Different data collection methods are better suited to specific research objectives based on their strengths and limitations.
Surveys are likely the most widely used data collection method because they offer flexibility in gathering diverse information from a sample of individuals. However, they are prone to significant errors, including memory bias and social desirability.
Digital trace data has gained popularity with the rise of online activity. Unlike surveys, it avoids memory errors and social desirability biases, as participants are not actively involved in data collection. Nevertheless, issues such as device under-coverage and shared devices can affect its accuracy.
A third approach is data donation, where participants voluntarily download and share their activity data from online platforms. This method ensures data integrity, addresses many of the device coverage issues associated with digital trace data, and avoids the biases of surveys. However, it has limited applicability, places a greater burden on participants, and is less suitable for studying multi-platform activity.
To compare these methods, we analyzed Amazon purchase behavior among individuals from a Spanish online panel (Netquest) that provides access to all three data collection methods: surveys, digital trace data, and data donation. Metrics included the number and value of purchases, descriptions of the most recent product purchased, and method-specific insights. We also explored challenges such as shared devices (impacting digital trace data) and shared accounts (affecting both digital trace data and data donation).
While data donation can be argued to be the most reliable, it was also the least accepted by participants. Surveys performed reasonably well for assessing overall purchase activity over a period but lacked accuracy for pricing and product details. Digital trace data captured granular purchase behaviors but suffered from device coverage and processing challenges.
This research provides guidance for researchers in selecting or combining these data
Dr Joachim G. Piepenburg (GESIS) - Presenting Author
Dr Lukas Otto (GESIS)
Smartphone-based intensive-longitudinal research designs carry many advantages for social scientists. One main advantage and use case are intensive-longitudinal (survey) designs, e.g., mobile experience sampling methods (MESM), ecological momentary assessment, or ambulatory assessment. By notifying participants multiple times per day about a questionnaire, researchers can capture state-like variables, short-time dynamics, within-person processes, and immediate reactions. Moreover, measuring the variable of interest “in-situ” might overcome memory errors and stereotypical answering behavior.
However, reactivity is one of the main concerns when conducting intensive-longitudinal mobile survey designs. Critics of intensive-longitudinal designs claim that these studies are more akin to “interventions” that affect the variables of interest rather than measure them. We, thus, examine the following:
1) How does participation in an intensive-longitudinal survey study affect attitudes, interest and behavior?
We will tackle this question using data collected prior to the 2024 European parliamentary election. A random subsample of participants from the newly established GESIS Panel.dbd Digital Behavioral Data Sample, was invited to take part in a smartphone-based European election study using the GESIS AppKit. By leveraging the invitation of the randomly selected subgroup of participants for the App-Study we can estimate the treatment effect of the App-Study using either intent to treat analyses or instrumental variable regressions, where invitation into the App-Study is used as an instrument for participation. Using this experimental approach, we test whether participation in the App-Study affects interest in the election campaign, in European politics, in information behavior, and voting behavior.