ESRA logo

ESRA 2019 glance program


Linking Surveys and Social Media Data – Challenges, Applications and Solutions 2

Session Organisers Professor Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences)
Dr Katharina Kinder-Kurlanda (GESIS - Leibniz Institute for the Social Sciences)
Dr Sebastian Stier (GESIS - Leibniz Institute for the Social Sciences)
TimeWednesday 17th July, 14:00 - 15:00
Room D21

When it comes to measuring phenomena that are of interest to social scientists, such as attitudes, beliefs, values or behavior, both surveys and data from social media platforms have their own advantages and disadvantages. For example, while survey data may be biased by social desirability or faulty memory, data from social media often lack important contextual information and do not capture relevant outcome variables. A promising way of dealing with the limitations of surveys and social media data is to link them. Such linking can help to answer interesting substantive research questions as well as methodological questions about the quality of the data (e.g., regarding the reliability of self-reports or the precision of inferring attributes from social media data).
The process of linking survey and social media data, however, is by no means trivial and comes with its own set of practical as well as ethical challenges. These relate to a variety of issues, including data access, informed consent, limitations imposed by terms of service of social media companies, data privacy, and data archiving and sharing. While there is some pioneering research that has linked data from surveys and social media to answer substantive or methodological questions, this approach is still not widely used, and an exchange of expertise is necessary to improve practices and create standards in this area. We invite contributions for this session that present suggestions for dealing with the various practical and ethical challenges of linking survey and social media data (ideally based on examples). Contributions can be empirical, methodological or conceptual. Relevant topics include but are not limited to:
• Examples of substantive or methodological questions that can be answered by combining surveys and social media data
• Improvement of measurements of human attitudes, beliefs, values, and behavior through the combination of surveys and social media data
• Incentives and Informed consent for studies that link surveys and social media data
• Bias in the sampling process and potential solutions
• Data sharing issues of linked survey and social media data

Keywords: social media, data linking, ethics, data sharing

Linking Survey and Twitter Data: Ethics, Consent, Anonymity, Archiving and Sharing

Dr Luke Sloan (Cardiff University)
Mr Curtis Jessop (NatCen Social Research )
Dr Tarek Al Baghal (University of Essex) - Presenting Author
Professor Matthew Williams (Cardiff University)

The advent of social media has provided researchers with a potentially rich source of information regarding the behaviours, attitudes and beliefs of individuals; but with it has come the substantial but necessary headache of reconceptualising the standard pillars of ethical social research conduct – informed consent, avoiding harm and anonymity. When social media data are linked with other forms of data, such as survey data, the issues are further complicated. For example, Twitter handles, the content of tweets and much of the metadata drawn down from the public API allows an individual to be identified, but their ‘anonymisation’ would negate much of the additional insight offered. Further, obscuring a user’s handle when presenting a tweet would violate Twitter’s terms of service. What then is to be done when Twitter data are linked with survey data where the data are not public and we would otherwise aim for anonymity? This raises broader questions such as how we can elicit informed consent from participants while maintaining the utility of the data. In this paper we will explore such issues, drawing upon our experiences of asking for consent to link survey and Twitter data in three large UK surveys. For a researcher looking to link survey and Twitter data, establishing informed consent is the most visible challenge but the reality is that, even after informed consent has been given, there are a myriad of issues to be resolved concerning collection of the social media data, the environment in which the linkage can take place, what is and isn’t disclosive, and what can be archived privately and publically for prosperity, and how. These issues also need to address the practice of users deleting tweets after collection, the withdrawal of consent, maintaining the dataset and ensuring that the TOS of Twitter are not violated.


Comparison of Twitter Posts and Survey Responses of the Greek Parliamentary Candidates

Miss Dimitra Papaxanthi (Aristotle University of Thessaloniki) - Presenting Author
Miss Evangelia Kartsounidou (Aristotle University of Thessaloniki)
Professor Ioannis Andreadis (Aristotle University of Thessaloniki)

Download presentation

It is common for the researchers to rely on surveys questions in order to find out the salient issues that a country deals with. However, in recent years an increase is being noticed in the use of Internet-based data (social media data, Google trends etc). Usually, they are data with immense volume and they are produced as a result of Internet activity. The increasing availability of these data has motivated the academia, companies and institutions worldwide to search for the best practices to use the aforementioned data. The main objective of this paper is to combine web surveys data and social media data in the best way in order to find what the Greek parliamentary candidates believed to be the country’s salient issues in 2015.
In our paper, we first examine the cases of candidates who have participated to the 2015 Greek Candidate Survey and have answered an open-ended question where respondents are asked to report the three most important problems of the country. We compare the issues mentioned by the parliamentary candidates as the most important in the, with the issues the candidates refer to, on their personal Twitter accounts. In this way, we can cross-validate the candidates’ responses to the online survey using their tweets and then we try to explore if we can use Twitter to complement the survey data for the respondents who have skipped the open-ended question.


Consent to Collecting and Linking Twitter Data in a Combined Webtracking and Survey Study

Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Dr Pascal Siegers (GESIS - Leibniz Institute for the Social Sciences)
Dr Sebastian Stier (GESIS - Leibniz Institute for the Social Sciences)
Dr Tobias Gummer (GESIS - Leibniz Institute for the Social Sciences)
Dr Arnim Bleier (GESIS - Leibniz Institute for the Social Sciences)

Download presentation

Linking survey data and social media data at the individual level requires explicit and informed consent from study participants. Especially in Europe with the recent introduction of the General Data Protection Regulation (GDPR), researchers have to provide detailed information about what data they collect, how they collect the data, and for what purpose they use it. In a study with participants from a web tracking panel maintained by a German market research company (N = 2042), we asked participants in a web survey for consent to collect their Twitter data and link it with their survey responses and tracked browsing behavior. N = 1347 panelists completed our online questionnaire, 22.8% of those respondents (n = 307) reported having a Twitter account, and of those, 65.8% (n = 202) consented to collection of their Twitter data. 196 of the respondents supplied a Twitter handle, of which 68 were unusable due to typos, invalid strings or accounts that clearly belonged to somebody else (e.g., celebrities), leaving us with 128 useable Twitter accounts. Notably, the surprisingly high consent rate for linking among the Twitter users in our study is partly due to the special nature of our sample. In a logistic regression model, we found that male, younger, and lower income Twitter users as well as those who have recently used Twitter are more likely to consent, whereas education and the incentive condition had no effect. In particular, we compared a 5 Euro prepaid to a 5 Euro promised condition (conditional on consenting). Interestingly, only a very small minority of the respondents read the extended privacy information that was provided on the project website via a link embedded in the short version of the short informed consent in the web questionnaire.