Transparency in Comparative Social Science Research |
|
Session Organisers |
Dr Elena Damian (University of Leuven) Professor Bart Meuleman (University of Leuven) Professor Wim van Oorschot (University of Leuven) |
Time | Tuesday 16th July, 11:00 - 12:30 |
Room | D22 |
Transparency is one of the foundations of the scientific method (cfr. Merton’s scientific norm of communalism, 1973) and has two main functions: to enable readers evaluate the validity and reliability of a study’s findings (evaluation transparency) and to conduct direct replications (replicability transparency) (Damian, Meuleman, van Oorschot, forthcoming). Despite its acknowledged importance in academia, very little measures have been taken to encourage greater research transparency. For example, it is still very uncommon for journals to have research transparency guidelines and there are no real incentives for scholars to voluntarily provide clear records of their studies. As a consequence, in the last decade, there has been increasing evidence of failure to replicate experiments (e.g., Open Science Collaboration, 2015), a growing popularity of various questionable research practices (e.g., Fanelli, 2009; Simmons, Nelson & Simonsohn, 2011; John et al., 2012) as well as cases of misconduct, which mostly feature in experimental psychology and medical research.
As cross-national survey research generally analyses large-scale publicly available data sources, this field seems to be relatively well protected against the replication crisis. However, we believe that its specific nature poses some particular threats to transparency. For instance, collecting data in multiple countries is a complex process that comes with many methodological issues. Despite this, researchers are still rarely required to or voluntarily do provide all necessary information on data and preparation procedures. As a result, many of these issues and data limitations are insufficiently reported in studies. Furthermore, although performing secondary analyses on cross-national data is a long and complex process (e.g., operationalisation of theoretical concepts, treatment of missing values, dealing with outliers etc.), many steps performed in this stage remain undocumented. Therefore, disclosure about the data and analytical procedures is crucial to evaluate and replicate this type of research.
In this session, we welcome papers that address the following topics:
(1) Theoretical contributions about transparency issues and/or possible solutions in cross-national survey research or quantitative social sciences research in general
(2) Empirical evidence of current research practices in quantitative social science research
(3) Examples of good practice (e.g., sharing an experience of publishing a transparent substantive study with replication materials and the lessons learned from the process)
Keywords: transparency, replication, cross-national research
Dr Rebekah Luff (University of Southampton) - Presenting Author
Professor Patrick Sturgis (University of Southampton)
This paper builds on and updates existing studies of the quality of reporting of surveys across social sciences disciplines and over time. Presser (1983) undertook a content analysis of the kinds of data used in articles published in the top ranking journals in the fields of Economics, Sociology, Political Sciences, Social Psychology and Public Opinion research. All of the research papers published in the selected journals were analysed for the years 1949-1950, 1964-1965 and 1979-1980. Presser found a general trend of increasing use of survey data during this period. He also found that the reporting of basic aspects of survey design and fieldwork, including sampling strategy and response rate were frequently missing or inadequate. Saris and Gallhofer (2007) updated and extended Presser’s analysis for the period 1994-1995. We now update this time series by analysing the quality and completeness of survey reporting in research articles for the same journals for the period 2014-2015. We present analysis of these 1451 papers in terms of the types of data and research designs used, as well as the quality of survey reporting across each of the five disciplines. The survey reporting indicators include the mode, sampling method and response rate for all surveys (both primary and secondary data). Our findings show that, while there have been some improvements in the reporting of surveys over time (and standards such as AAPOR response rate calculations), many of Presser’s original concerns still stand and that reporting quality varies between disciplines.
Ms Katharina Kunißen (Johannes Gutenberg University Mainz) - Presenting Author
The welfare state plays a key role in explaining a variety of social phenomena on the micro-level. Thus, cross-cultural studies in the social sciences often include it as an independent variable in analyses of manifestations and consequences of social inequality (e.g. health, poverty, attitude formation and much more). Still, the literature lacks consensus on how to operationalise the welfare state or features of the welfare state in such cases. Instead, many different approaches exist in the literature, varying when it comes to conceptual choices (is the welfare state conceptualised based on effort, rights or benefit receipt?) as well as empirical ones (does the operationalisation rely on typologies, single indicators or composite measures?).
Existing methodological debates surrounding this issue treat the welfare state as a dependent variable. However, the differences between approaches are especially consequential once treated as independent variable because measures are adapted under two assumptions: (1) What is eligible as dependent variable should be suitable as explanatory variable as well. (2) Differences between operational approaches are negligible because they capture similar or at least strongly related elements of the same construct – the welfare state. Both assumptions are problematic. There are practical issues, such as a lack of data availability for specific countries or points in time. Furthermore, there are conceptual issues, since different operationalisations highlight different aspects of social policies. Lastly, comparability of results is impaired, because it is doubtful that different indicators lead to similar evidence. All of this raises the question whether some seemingly inconsistent results in the literature (e.g. in the case of health inequalities) might actually be due to different operationalisations of the welfare state.
This contribution addresses these issues and explores ways to standardise the operationalisation of social policies as independent variables in multilevel analyses by proposing a conceptual framework for the selection of indicators and demonstrating its empirical application.
Mr Tobias Gebel (German Institute for Economic Research (DIW Berlin)) - Presenting Author
Secondary analysis, meaning the use of existing data to investigate a new question, is a long-established research strategy in quantitative social research, which is increasingly gaining relevance for qualitative social research. Secondary analysis offers new analytical potential by examining and tapping existing research data from new perspectives. Due to their high information density, interview data offer rich opportunities for new research questions. Qualitative interview data are particularly widespread in empirical organizational research and used in national and international comparative studies. They are especially relevant for time comparisons, trend analyses and broad-based cross-sectional analyses that aim to investigate structural changes in organizations. Moreover, secondary analyses also contribute to method development and support the traceability of research results by offering the possibility for reflexive consideration. Based on its described merits, there recently has been a growing interest in secondary analysis of qualitative interview data in German organizational research. However, the range of available interview data for organizational research is currently limited. The present paper therefore provides an overview of existing secondary data available for qualitative organizational research, its access routes and implications for the research process. Subsequently, the existing research structure in Germany is contrasted with the results of a meta-analysis of secondary analyses of qualitative interview data published in German organizational research. This analysis provides comparative insights into current and completed secondary analyses from the German organizational research context. This meta-analysis will focus on motivations for conducting secondary analyses in concrete research contexts, the associated implications for research practice as well as the methodological and content-related implementations of secondary analyses.
Dr Tobias Heycke (GESIS – Leibniz Institute for the Social Sciences) - Presenting Author
Dr Bernd Weiß (GESIS – Leibniz Institute for the Social Sciences)
Recently, in empirical research two phenomenon have received a substantial amount of attention: p-hacking and HARKing. When running multiple analyses on the same data set and adjusting the analysis until a p-value smaller than the set alpha level is reached, we speak of p-hacking. This behavior is problematic as it (highly) increases the chance of an alpha error (Simmons, Nelson, & Simonsohn, 2011). HARKing (Hypothesizing After the Results are Known) describes the behavior of researchers to present post-hoc hypotheses (usually based on statistically significant results) as a priori hypotheses (Kerr, 1998). Both p-hacking and HARKing can be dismissed, when the hypotheses and data analysis were preregistered before the data was collected (e.g., registration of clinical trials or experimental psychology studies).
However, in sociology many scientists do not collect the data themselves and often large, pre-existing data sets are analyzed (i.e., secondary data analysis). As the possibility of p-hacking and HARKing are at least equally likely in secondary data analysis as in primary data analysis, the need for preregistration seems just as necessary. When running secondary data analyses, however, it is more difficult to convince others that the analyses were indeed planned a priori. We therefore propose the following two mechanisms, for secondary data analyses:
First, when data is publicly available, a preregistration plan should be written before the data is analyzed. The scientific community should additionally establish that writing a preregistration after looking at the data should be considered a case of serious scientific misconduct, similar to data fabrication.
Second, when data is available but access needs to be requested, the registration should be written before the data is requested. We specifically propose that the data hosting institution should facilitate preregistration by including the possibility to preregister with them before providing access to the full data set.