Tuesday 16th July
Wednesday 17th July
Thursday 18th July
Friday 19th July
Download the conference book
Hierarchical data analysis with few upper level units: Solutions and applications beyond multi-level modeling |
|
Convenor | Dr Celine Teney (Social Science Research Center Berlin (WZB)) |
Coordinator 1 | Mr Heiko Giebler (Social Science Research Center Berlin (WZB)) |
Coordinator 2 | Dr Onawa Promise Lacewell (Social Science Research Center Berlin (WZB)) |
This panel addresses the problem of linking nested, "many-to-few" data while retaining important information about variation at all levels. To clarify further, one problem of traditional multilevel modeling techniques is that such models require a sufficiently large number of observations on the higher levels. However, it is often the case that interesting phenomenon occur where upper level observations are limited to numbers below common thresholds acceptable to draw inferences in traditional multilevel modeling. At the same time, the number of higher level units is too large to apply classical comparative case study designs. For instance, when attempting to link MPs' positions or behavior to party characteristics, scholars often face a problem that survey or roll-call data gives us sufficiently large number of observations for the individual level, but we are limited to few observations at the party level. The same is true for mass surveys conducted only a couple of times in a country, for example the ESS or the World Values Survey. Another up-to-date example would be the analysis of cross-national differences in citizens´ positions toward the EU in countries that benefited from the European rescue package.
Therefore, this type of many-to-few data linking poses a unique set of problems to researchers. Besides inappropriate multilevel modeling techniques, many of the proposed solutions to such problems like clustering or using country dummies do not or only in a very limited way allow researchers to explain variation in both the upper and lower level units--an often important point for theory testing. Our panel invites papers of both methodological and substantive nature which develop and explore innovative ways to overcome problems associated with such many-to-few, nested models while still seeking to explore patterns of variation at all levels.
Political scientists often want to know how a variable measured at the group-level affects an outcome measured at the individual level. Inference with grouped data presents challenges because the amount of independent information in the data is often more related to the number of groups than to the number of individual observations. A common parametric solution to this problem is to calculate cluster-robust standard errors and to use those errors in hypothesis tests. However, cluster-robust standard errors perform poorly when the number of groups is fewer than 50.
Unfortunately, there are many situations in which a researcher would want to analyze grouped data with fewer than 50 groups. For example, if one is interested in estimating the effect of some cross-country variation on an individual outcome, then one has only 30 OECD countries or 18 Latin American countries. If one is interested in the decisions of key actors, then one may observe many vetos, but only 44 U.S. presidents; or, one may observe many votes, but only 36 U.S. Supreme Court justices post-WWII.
I propose an alternative approach to inference with grouped data---the randomization test. I present Monte Carlo evidence which shows that, in terms of Type I error rates, the non-parametric randomization test outperforms t-tests using cluster-robust standard errors regardless of the number of groups. In terms of power, the loss from using randomization tests is small. Thus, randomization tests are a viable alternative to the cluster-robust
Multilevel modeling has known last decades a spectacular development. This method provides quality tools to handle hierarchical data, as well as diverse multistage samples. Sample size for the different levels of analysis is an important requirement for multilevel models. Nevertheless, this issue is still relatively little studied. We dispose of some rules of thumb (30/30 to 100/10 for the simplest models), but little is still known about the consequences of ignoring this requirement.
If the level-one sample size is generally large enough, this is rarely the case for the level-two sample: Most analysis uses less than 100 units for the higher-level units. The number of groups is mostly limited because of cost considerations or because all the units have been selected. For example, although multilevel is largely used for cross-national comparisons, the PISA 2009 database covers 65 countries and the last ESS round only 11.
Simulations have shown that significant bias could appears when the dataset has less than 100 units at the higher levels. The large use of multilevel technics for such data is problematic. However, throwing multilevel method away is not a solution.
In this context, we run simulations to apprehend the likely bias which can occurs with different parameterizations for limited sample size. Such simulations broaden the knowledge about the limits of the multilevel modeling. Next, we provide alpha-level adjustments that could enable the use multilevel technics with small sample at the highest level in a more conservative way.
Background
In Germany is an increasing need for small area data on illness prevalences, risk factors and health related behavior and attitudes. In our planned contribution we are going to present and discuss our methods and preliminary results for Germany in light of available reference data.
Data and Methods
We are using data of the study "German Health Update" for the years 2009 and 2010 (n=43.312). Small-Area-Estimation is performed using three level hierarchical regression models with additional governmental area data (approx. 250 variables) and best linear unbiased prediction of the prevalences. We used a core set of area level variables (unemployment rate, GDP per capita and mean household income) and a variable set that is chosen algorithmically. Reference data are obtained from several federal sources.
Results
Our Estimation of Prevalences show large disparities regarding diabetes, obesity and flu vaccination rates in Germany. After controlling for age, gender and area level variables all small area models show no significant area level association for all three outcomes. Different sets of context variables out of the federal database are relevant for the prediction of each of the four outcomes. Our comparison with reference data lead to mixed results with correlations of predictions from r=.3 to r=.8.
Diskussion
We are investigating the best way of incorporating the results into regular health reporting for the federal and state level. The statistical properties of our preliminary models are looking promising but the lack of reference data is a challenge.