Handling missing data 3 |
|
Chair | Dr Tarek Mostafa (University College London ) |
Coordinator 1 | Professor George Ploubidis (University College London) |
Coordinator 2 | Mr Brian Dodgeon (University College London) |
As a result of multiple imputation researcher obtains several complete data bases, then analyzes each of them separately by the same method, and finally aggregate obtained results, using specific formulas, called Rubin rule.
It is clear that to carry out the same analysis several times on each array, and then combine them - very time-consuming task. This process is partly automatized in statistical packages that support multiple imputation, but often the researcher still faces the need to calculate the parameters manually using Ruby rules. In this regard, researchers have repeatedly made attempts to simplify the algorithm for multiple imputation, but until now they have been limited to any specific type of analysis. Thus, there is no theoretical or empirical evidence that effective alternatives to the use of rules Rubin for all other research situations do exist.
This study is an attempt to compare the effectiveness of two approaches to the aggregation of the results of multiple imputation. The first - a classic – Rubin rule. This method is used in almost all studies, where the missing values are imputed. The second possible approach - to change the steps of the classical multiple imputation to simplify the work with him, that is, first, to produce aggregation (in this work - with the help of "averaging") of values substituted on each missing value. As a result we obtain one complete data set, which can be analyzed with planed method.
Obviously, the use of classical, theoretically and methodologically well-designed and repeatedly tested algorithm - way more reliable, but the second approach is much faster and easier to work with multiple imputation and, according to our assumptions, in some research situations, it can more effective rather than Rubin rule. To compare the efficacy of a particular approach is theoretically quite difficult, therefore, for initial testing the assumptions on which directed the study, we will use a statistical experiment.
Thus, this research is intended to establish whether there are research situations in which the aggregation of multiple imputation results by averaging the substituted values and analyzes of a one single array to be more effective than the aggregated results of the analysis using Rubin rules. We believe that the effectiveness of a particular approach depends on the specific research situation, under which in this work we mean a combination of the type of a variable scale study with admissions, the proportion of missing values in the array and data analysis method that will be applied to the data after imputation. In this study, three types of scales will be considered (nominal, ordinal and interval), the cases of 10%, 30% and 50% missing values in an array and a common in sociological research data analysis methods such as descriptive statistics, search for relationships between two variables and linear regression.
Missing data in a survey is generally modeled as a random mechanism. However, systematic missing data may be caused by errors in programming of an electronic survey instrument that are not detected until near or after the end of data collection. In this situation, a class of respondents is never directed to questions that should have been presented. Since the missing data mechanism is not random, traditional imputation methods such as hot-deck imputation cannot be applied without thought as there are no donors from the same class. In this paper, we will discuss two routing errors in the 2016 Survey of Prison Inmates, a bias analysis and a proposed imputation method.
We propose using prior iterations of the same survey, with no routing errors associated with the selected analytic variables, to identify appropriate imputation classes and study the possible bias. Studying data from the prior round of the survey, we will determine whether there are classes of respondents that were not mis-routed that are similar to those that were mis-routed in the 2016 survey. We also study bias by examining the difference in the estimate with and without the respondents that were mis-routed.
Additionally, one of the routing errors was discovered during the data collection and was corrected after approximately 7/8 of data collection was completed. Thus, for a small number of cases, we have respondents correctly routed in the current iteration of the survey. We will similarly study these respondents and compare to those that were not mis-routed in the same clusters to determine the potential bias impact.