Linking Surveys and Social Media Data – Challenges, Applications and Solutions 1 |
|
Session Organisers | Professor Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences) Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences) Dr Katharina Kinder-Kurlanda (GESIS - Leibniz Institute for the Social Sciences) Dr Sebastian Stier (GESIS - Leibniz Institute for the Social Sciences) |
Time | Tuesday 16th July, 16:00 - 17:00 |
Room | D25 |
When it comes to measuring phenomena that are of interest to social scientists, such as attitudes, beliefs, values or behavior, both surveys and data from social media platforms have their own advantages and disadvantages. For example, while survey data may be biased by social desirability or faulty memory, data from social media often lack important contextual information and do not capture relevant outcome variables. A promising way of dealing with the limitations of surveys and social media data is to link them. Such linking can help to answer interesting substantive research questions as well as methodological questions about the quality of the data (e.g., regarding the reliability of self-reports or the precision of inferring attributes from social media data).
The process of linking survey and social media data, however, is by no means trivial and comes with its own set of practical as well as ethical challenges. These relate to a variety of issues, including data access, informed consent, limitations imposed by terms of service of social media companies, data privacy, and data archiving and sharing. While there is some pioneering research that has linked data from surveys and social media to answer substantive or methodological questions, this approach is still not widely used, and an exchange of expertise is necessary to improve practices and create standards in this area. We invite contributions for this session that present suggestions for dealing with the various practical and ethical challenges of linking survey and social media data (ideally based on examples). Contributions can be empirical, methodological or conceptual. Relevant topics include but are not limited to:
• Examples of substantive or methodological questions that can be answered by combining surveys and social media data
• Improvement of measurements of human attitudes, beliefs, values, and behavior through the combination of surveys and social media data
• Incentives and Informed consent for studies that link surveys and social media data
• Bias in the sampling process and potential solutions
• Data sharing issues of linked survey and social media data
Keywords: social media, data linking, ethics, data sharing
Mr Christoph Beuthner (GESIS Leibniz Institute for the Social Sciences) - Presenting Author
Professor Florian Keusch (University of Mannheim)
Dr Natalja Menold (GESIS Leibniz Institute for the Social Sciences)
Dr Jette Schröder (GESIS Leibniz Institute for the Social Sciences)
Dr Bernd Weiß (GESIS Leibniz Institute for the Social Sciences)
Dr Henning Silber (GESIS Leibniz Institute for the Social Sciences)
Combining survey and social media data poses a great opportunity for social research. Data gathered from websites such as Facebook or Twitter can help researchers to understand human behavior and social reality. Depending on the research question, different social networks can provide different supplementary information that can be combined with survey data. While Instagram can deliver images, Twitter can deliver short written statements, and LinkedIn can deliver employment data. However, linking survey data to social media data requires respondents’ consent and willingness to cooperate in the linkage procedure.
In this presentation, we explore reasons why respondents are willing to allow or deny access to their social media accounts. The data were collected from a German online access panel (N = 3,374) in August 2018. Our study randomly assigned respondents either to use a desktop computer/laptop or a smartphone. Furthermore, we asked respondents how often they were using different social media services and whether they were willing to share data from thirteen different websites and apps, including major services like Instagram, Facebook and Twitter. Additionally, we explicitly asked whether they were using these services on their smartphone, as this is the common way most of the services are used. The questionnaire also includes questions on privacy concerns, trust, smartphone usage, attitudes toward surveys, and demographics, which will help us to get insights into the mechanisms of the respondents’ willingness to cooperate.
In our analysis we examine the influence of those variables on respondent’s willingness to share social media data. Our experimental design allows us to calculate differences between respondents on different devices. We also analyze the influence of privacy issues, trust and other variables. Finally, we investigate if specific demographic characteristics lead to a higher likelihood to share data from social networks leading to bias.
Dr Oliver Davis (MRC Integrative Epidemiology Unit at the University of Bristol)
Mr Andrew Boyd (University of Bristol)
Dr Alastair Tanner (MRC Integrative Epidemiology Unit at the University of Bristol)
Ms Nina Di Cara (MRC Integrative Epidemiology Unit at the University of Bristol) - Presenting Author
Dr Luke Sloan (Cardiff University)
Dr Tarek Al Baghal (Essex University)
Dr Lisa Calderwood (UCL)
Dr Claire Haworth (MRC Integrative Epidemiology Unit at the University of Bristol)
The UK’s population based birth cohorts have each followed thousands of participants for their whole lives, collecting diverse social, behavioural, biological and health measurements spanning decades. New types of data collection such as social media linkage have the potential to enrich these datasets with high-resolution time course data on real human behaviour. At the same time, these richly characterised cohorts with gold-standard survey and other measurements collected at regular intervals could be ideal platforms for the validation of information derived from social media. Our project, supported by CLOSER, the UK Economic and Social Research Council (ESRC), the UK Medical Research Council (MRC) and the Alan Turing Institute, is working in partnership with cohort participants and leaders to develop a software framework to facilitate social media linkage in the eight cohorts that make up the CLOSER consortium, starting with a proof-of-principle implementation in the Avon Longitudinal Study of Parents and Children (ALSPAC). To work out the best way to do this, we are engaging with two generations of ALSPAC participants to find out what is acceptable to them in terms of collecting and using their interactions on social media. This has informed the development of software that collects, codes and shares social media data while protecting the anonymity of participants.
Dr Janez Štebe (Univeza v Ljubljani, Arhiv družboslovnih podatkov) - Presenting Author
An extended family of FAIR principles (Findable, Accessible, Interoperable, Reusable) guide the assessment of data readiness for reuse as discussed also in recent EC Expert Group report on Turning FAIR Data into Reality. The purpose of the paper is operationalising those principles into the realm of social science use and sharing of (linked or not) social media (SM) data. Some of the key considerations sharing SM data are about following the Term of Service of commercial companies, assessment of users’ perception about privacy of communication, personal data protection and gained consent options. A variety of archiving and access options has been proposed, including the secure data access to personal (linkable) data, or storing the social network posts ID’s, or providing access to full content of SM data if ethical and legal conditions permit. We will search for comprehensive set of examples of SM data sharing, and analyse and evaluate the options by applying the FAIRness criteria to measure general utility of data for the end user, including sufficient quality and appropriates for given purpose. The FAIRness considerations will be balanced against the data curation service costs estimation, which will be based on established models (e.g. 4C project - http://www.4cproject.eu/). The outcome will be a recommendation for the data sharing options according to the types of SM data. The results synthesises some of the work in the SERISS project WP6: New forms of data – legal, ethical and quality issues.