Different Methods, Same Results? – How Can We Increase Confidence in Scientific Findings 2? |
|
Session Organisers |
Dr Thorsten Kneip (Max Planck Institute for Social Law and Social Policy, MEA) Dr Gerrit Bauer (LMU Munich) Professor Elmar Schlueter (Justus-Liebig-University Giessen ) Professor Jochen Mayerl (Chemnitz University of Technology) |
Time | Friday 19th July, 11:00 - 12:30 |
Room | D22 |
This session follows up on earlier discussions at last year's ESRA conference on the fruitful use of multiple methods. We are interested in how to increase confidence in scientific findings in the light of mixed evidence on the one hand and busted seemingly established findings on the other ("replication crises"). While we have seen an ever increasing proliferation of methods for survey data collection and analysis in recent years, there is still a lack of standards in how to aggregate findings. Too easily, convergence of findings is seen as indicative for a "true" effect, while it may well reflect repeated systematic errors. While replicability and reproducibility are fundamental for empirical research, replication studies may not only confirm true but also false results. In a similar vein, diverging results when using different methods are often to be expected, as they aim for the identification of different effects or rely on different assumptions, the violations of which lead to different forms of bias.
The common problem seems to be rooted in a lack of awareness and transparency regarding implicit decisions made in the process of analysis and the lack of explication and discussion of model assumptions. We invite researchers to submit papers discussing the consequences of applying alternative methods of survey data analysis addressing the same research question. A focus should be given on making explicit all assumptions related to the chosen method(s). Examples would be:
Studies comparing at least two different estimation approaches, addressing different potential sources (and directions) of bias; extensive robustness checks varying theoretically undetermined parameters (e.g. functional form of control variables, definition of analytic sample); replication studies critically reflecting or challenging decisions made in the entire research process; crowd research.
Dr Thorsten Kneip (Max Planck Institute for Social Law and Social Policy) - Presenting Author
Professor Henning Best (University of Kaiserslautern)
We aim at identifying the causal effect of reducing behavioral costs of participation in household waste recycling through curbside collection. Using propensity score matching and differences-in-differences estimation with individual-level panel data we estimate the effect of curbside collection, its variation between types of recyclables and sociodemographic background variables, and its elasticity with regard to the distance to collection containers in the bring scheme condition.
Our data allow us to estimate treatment effects for materials not directly affected by curbside collection, which can be exploited for refined analyses. Specifically, we argue that in our setting DD may be systematically upward biased due to the outcome variable being self-reported whereas DDD will be systematically downward biased in the presence of possible spillover effects. At the same time, both estimation strategies should be insensitive to the other's weakness. Accordingly, they can be combined to derive upper and lower bounds of the true effect.
We find that a curbside scheme has no effect on paper recycling but increases recycling participation by between 10 and 25% points for plastic and packaging. Moreover, we find systematic treatment effect heterogeneity with regard to pre-treatment distance to collection sites and individual environmental attitudes.
Mrs Miriam Trübner (University of Bonn) - Presenting Author
While there is an extensive literature on domestic division of labor (DOL) in social sciences, findings on mechanisms have been inconsistent. Usually, analyses are restricted to the division of housework where the dependent variable of regression models is constructed as the absolute or relative amount of housework separately for men and women. This empirical approach is based in the tradition of methodological individualism, explaining DOL by individual characteristics and behavior, and not fully considering the interdependency and interaction between actors. Further, the focus on housework limits insights regarding DOL for other tasks in the domestic sphere and is characterized by an equivalent conceptional status of unpaid and paid labor. Using the German Family Panel (pairfam), we aim to explore how robust mechanisms related to DOL are with regard to model specification and choice of the dependent variable.
In our analyses of 2,982 heterosexual couples living together in the same household, we focus on the comparison of univariate regression models with Actor-Partner-Interdependence Models (APIM) (Kenny et al. 2006) and dyadic classification (Schmitz 2012; 2017). In contrast to unidirectional regression models, APIMs take into account the interdependency between partners by estimating the extent to which the independent variable of one spouse effects his or her own score for the unpaid labor variable or that of the other spouse. While APIMs relate mechanisms of DOL to the average dyad, dyadic classification allows us to reconstruct classes of dyads representing various division of labor arrangements. This approach takes into consideration mutual interdependencies between both spouses as well as the complexity of paid and unpaid labor.
We conclude that analytical strategies may impact results on the effects of DOL. The recommendation is to respect the effect of different tasks in the outcome variable and use analytical methods taking advantage of dependency between spouses in order to minimize potential individualistic bias.
Dr Elena Damian (University of Leuven) - Presenting Author
Professor Bart Meuleman (University of Leuven)
Professor Wim van Oorschot (University of Leuven)
Multilevel regression analysis is one of the most popular types of analyses in cross-national social studies. However, since its early applications, there have been constant concerns about the relatively small numbers of countries in cross-national surveys and its ability to produce unbiased and accurate country-level effects. A recent review of Bryan and Jenkins (2016) highlights that there are still no clear rules of thumb regarding the minimum number of countries needed. The current recommendations vary from 15 to 50 countries, depending on model complexity.
This paper aims to offer a better understanding regarding the consequences of group-level sample size, model complexity, effect size, and estimator procedure on the precision to estimate country-level effects in cross-national studies. The accuracy criteria considered are statistical power, relative parameter bias, relative standard error bias, and convergence rates. We pay special attention to statistical power - a key criteria that has been largely neglected in past research. The results of our Monte Carlo simulation study indicate that the small number of countries found in cross-national surveys seriously affects the accuracy of group-level estimates. Specifically, while a sample size of 30 countries is sufficient to detect large population effects (.5), the probability of detecting a medium (.25) or a small effect (.10) is .4 or .2, respectively. The number of additional group-level variables (i.e., model complexity) included in the model does not disturb the relationship between sample size and statistical power. Hence, adding contextual variables one by one does not increase the power to estimate a certain effect if the sample size is small. Even though we find that Bayesian models have more accurate estimates, there are no notable differences in statistical power between Maximum Likelihood and Bayesian models.
Dr Andrés Santana (Universidad Autónoma de Madrid) - Presenting Author
In the proposed presentation, it will be argued that the statistical analysis have to pay less attention to the “betas”. When the dependent variable is qualitative, betas are practically irrelevant. This approach is enticing growing acceptance and popularity among researchers, to some extent, thanks to the advances in statistical packages (such as predictive margins). The rest of the exposition will focus on the critical discussion of the two most frequent strategies to follow this approach:
1. The contrasting and the overlapping strategies.
2. The possibility of differences between the two strategies: simple case.
3. Substantive implications of discording results.
4. Conditions under which the two strategies will yield concording or discording results.
5. Criteria to select the correct strategy based on probabilistic analysis.
6. Criteria to select the right strategy based on covariance analysis.
7. Possibility of discording results between the two strategies: complex case.
8. Serious substantive implications of the discording results.
9. Evaluation of the impact of these consideraton on the publications of the 10 top journals of the discipline in the last two years.
10. Numerical and graphical solution to obtain the correct result.