Different methods, same results? Comparing the consequences of alternative methods of data collection and analysis 1 |
|
Chair | Professor Elmar Schlueter (Justus-Liebig-University Giessen ) |
Coordinator 1 | Professor Jochen Mayerl (University of Kaiserslautern) |
It has been observed for several decades that standardized interviewing, the prevailing approach to collecting survey data in the social sciences and government research, may not always lead to uniform interpretation (e.g., Suchman & Jordan, 1990). If some respondents interpret questions differently than intended by the question authors, the accuracy of individual responses and possibly of the resulting population estimates may be compromised. Standardized interviewers read the question as worded and then use only nondirective (i.e., largely content-free) probing to address comprehension problems. To address this potential weakness with standardized methods, an alternative approach to survey interviewing has been proposed that encourages interviewers to clarify the survey concepts, using whatever words they judge will be most effective, when they determine there is misalignment between the respondents’ interpretation and how the question was intended (e.g., Schober & Conrad, 1997; Conrad & Schober, 2000). The logic behind this approach is that successful everyday communication often involves back and forth between speaker and listener to assure they are on the same page – at least sufficiently to accomplish their current conversational task. This process of “conversational grounding” (e.g., Clark, 1996) is at the heart of the proposed alternative interviewing technique and has led to its being called “conversational interviewing.” In contrast, standardized interviewers cannot ground question meaning because this could involve substantive wording that might differ between respondents.
At least ten studies have been conducted to evaluate the pros and cons of standardized and conversational interviewing. This paper reviews and synthesizes several of these studies describing what has been learned and what is still unknown. Among the clear findings are that conversational interviewing can dramatically improve response accuracy for factual questions when there is ambiguity about the meaning of key concepts in the questions. But the improvement requires additional interviewing time as clarification necessarily involves additional words; this is the case not only in interviewer-administered interviews but also in automated interviews carried out by animated virtual interviewers. Across studies, the improvement in response accuracy is greater when the interviewer can provide clarification both when respondents ask for it and when the interviewer judges the respondent needs it, even without an explicit request. This is also true in web questionnaires with clickable definitions and that can clarify concepts when respondents are slow to answer. Respondents seem to be sensitive to whether interviewers are able to provide clarification in this way, using more disfluent speech and, in face-to-face interviews, averting the interviewers’ gaze more often than do respondents in standardized interviews. Conversational interviewers with greater interpersonal sensitivity are more efficient, whether clarifying concepts about factual or opinion questions. And these benefits accrue without increasing interviewer variance. One question currently being investigated is whether conversational interviewing can help improve quality in other ways such as reducing acquiescence and straightlining by better communicating the meaning of response scale values. More also remains to be learned about the practical tradeoffs involved in administering conversational interviews in production surveys, but the evidence suggests they are
Declining response rates and increased costs associated with traditional sampling and data collection approaches have led many researchers to more rigorously explore alternative methodologies, and web-based methodologies in particular. Many researchers turned again to these methods in the 2016 presidential race. In the final week preceding the 2016 Presidential vote, Gallup conducted an experiment comparing these new, web-based methodologies, with its traditional phone survey methodology. Gallup’s experiment was designed to compare data from several data collection approaches, including a phone survey utilizing a random-digit-dialing (RDD) dual frame sampling approach and web-based methodologies.
The comparisons allowed Gallup to explore attitudinal differences about the election and likely voter models between different methodologies and non-representative and representative sampling frames, allowing researchers to identify bias associated with different sampling frames. In this presentation, Gallup will provide details about the accuracy of these approaches when compared with the final election results. Gallup will also share data about the actual turnout of opt-in panel members, comparing their self-report and likely voter model prediction with their actual turnout based on state election board records.
In a highly influential paper published in the American Journal of Political Science, Stegmueller (2013) claims that mixed effects multilevel models estimated by frequentist methods provide biased parameter estimates and severely anti-conservative inference for context effects when the number of upper-level units is small. Stegmueller recommends using Bayesian estimation instead, which he finds to be more accurate. In this paper, we reassess and refute Stegmueller’s claims. First, we present analytical proof that frequentist mixed effects models provide unbiased estimates of context effects. We further illustrate that the apparent bias in Stegmueller's simulations is simply a ramification of Monte Carlo Error. Second, we show how the reported inferential problems of frequentist estimation arise from using Full Maximum Likelihood estimation and from relying on the standard normal (Gaussian) distribution to identify confidence limits and p-values. Using both Restricted Maximum Likelihood Estimation and the t-distribution with approximately correct degrees of freedom yields accurate inference, even for very small upper-level samples. Hence, concerns about the minimum number of clusters necessary for multilevel analyses, which have long haunted comparative social science, are unjustified. While there may be compelling reasons to favor Bayesian estimation, it is not necessary for achieving accurate inference for multilevel analyses based on small upper-level samples.
Methods of data analysis in the social sciences can either be classified as “effects-of-causes” or “causes-of-effects” approaches (Goertz and Mahoney 2012: 41). Quantitative methods follow the effects-of-causes approach, which aims for the causal effect of variables of interest. In reverse, the causes-of-effects approaches seek for the causes of a certain outcome. This procedure is, according to Goertz and Mahoney (2012: 42) a characteristic of the “qualitative culture” in social research. This fundamental distinction between quantitative and qualitative research serves as a starting point for our study. We are applying a multilevel regression analysis as well as a Qualitative Comparative Analysis (QCA) in order to study the similarities and the differences of the methods of data analysis. The aim is to open up different perspectives in cross-country comparison on the same object according to the applied method.
The object of the analysis is the rise in female labor force participation. Since World War II the increasing labor force participation of women is one of the most important social changes, which was mainly driven by a stronger labor market attachment of mothers. But despite an almost global trend in rising female labor force participation rates, we observe noticeable differences between countries in the strengths and the patterns of women’s labor market integration. Previous research that aims to explain those differences refers to country differences in modernization, institutions or culture and most studies use methods of quantitative cross-country comparison like multilevel analysis or time-series analysis. This quantitative approach of international comparison has been subject to several critiques during the last years (e.g. Ebbinghaus 2005). Most important, the restricted variation of independent variables across countries and resulting problems of multicollinearity as well as path dependency and geographical autocorrelation which violates a substantial assumption of regression analysis has questioned the applicability of this approach for cross-country comparison in general.
This raises the question whether the use of an alternative method like QCA is better suited for cross-country comparison. QCA is a diversity oriented method and captures the different cultural and institutional contexts (Ragin 2008: 109). The three fundamental principles—‘equifinality, conjunctural causation and asymmetry’ (Schneider and Wagemann 2012: 8)—define QCA’s specific approach on comparative research. It offers the possibility to distinguish between necessary and sufficient conditions and identifies equifinal explanations for a social phenomenon. In case of female labor force participation it is likely that conjunctural causation and the asymmetry of conditions are suited to capture the complexities of the patterns for explaining the differences between the countries.
This paper aims to compare those two approaches of international comparative research to test the explanatory power of structural and cultural factors in explaining country differences in female labor force participation. Using macro-level indicators for more than 80 countries we perform both cross-country regression analysis as well as QCA analysis to answer the question whether the application of different methods lead to the same results and new insights.