Application of Interaction Effects for Social Processes Modeling |
|
Session Organisers |
Miss Svetlana Zhuchkova (Higher School of Economics) Mr Alexey Rotmistrov (Higher School of Economics) |
Time | Tuesday 16th July, 16:00 - 17:00 |
Room | D31 |
In social research, many theoretical and empirical arguments for the analysis of multivariate associations exist. One of the special cases of multivariate associations is the interaction of features, i.e., a combination of categories of variables, which determines some phenomenon. Beginning from the 1960s, such interaction effects have been in the focus of researchers' attention due to their wide prevalence in the survey data: since then, methods for automatic interaction detection have been developed, and the results of these methods have been included in the predictive models. However, although the idea of finding interactions was initially focused on survey data, in fact, this practice is not widely used in empirical research. Partially, it is explained due to the lack of a universal method of searching for the necessary interactions. In the case of categorical variables, there are many such methods (for example, log-linear analysis, decision trees, multiple correspondence analysis, and so on), and their implementation differs significantly, and methods of searching for interactions of continuous variables, on the contrary, are practically unknown. On the other hand, difficulties arise in obtaining final predictive models: there is a risk of parameter estimates bias due to multicollinearity, and the interpretation of the effects becomes complicated. At the same time, the researcher who is limited to “traditional” modeling and examines only the two-dimensional associations could face the Simpson's paradox. This phenomenon, which is widely known in statistics, occurs when a multivariate association “disappears” or changes its direction if a researcher analyzes only aggregated (two-dimensional) data. This problem leads to incomplete or incorrect conclusions about social reality. The question addressed to the participants of the session is: how to strike a balance between the possibilities of improving the explanatory or predictive abilities of models by considering interaction effects and limitations that arise in the process of selecting the desired effects and building the final model? Approximate themes of reports of the session are devoted to a comparison of methods of searching for interaction effects, ways of overcoming the identified limitations, the best practices in building predictive models with the interaction effects, and a comparison of alternative ways of building predictive models with multivariate associations – for example, using multilevel regression.
Keywords: data analysis, data mining, interaction effects, multivariate associations, predictive models
Miss Svetlana Zhuchkova (Higher School of Economics, Moscow) - Presenting Author
The research is primarily methodological one: it is devoted to the role of multivariate associations in predicting electoral choice. Most of the hypothetical predictors of this choice are categorical variables, and in practice, the multivariate analysis of such variables is quite rare. However, ignoring multivariate associations of categorical features can have negative consequences, such as the Simpson’s paradox, deterioration of the predictive quality of the model and incorrect conclusions about social reality. Using sociological theoretical approach to the study of electoral behavior, on the example of the Russian sample of the ESS in our study we compare three appropriate methods for the search of multivariate associations of categorical variables: log-linear analysis, multiple correspondence analysis, and CHAID. Although the chosen methods are based on the analysis of contingency tables and the calculation of the chi-square criterion, they differ significantly in their implementation, so the results of their application are different. By bringing the results to a single form of combinations of categories, we show that the most effective method in describing the portrait of the electorate is a multiple correspondence analysis, and the most effective method in forecasting is a log-linear analysis. In addition, the results show that the inclusion of the obtained combinations significantly improves the predictive quality of the model, and therefore, the need to account for multivariate associations in the studies of the electorate is confirmed. Finally, the obtained electorate portraits are substantially correlated with stereotypes about the electorate of certain parties, which are common in Russian society.
Mr Aleksei Rotmistrov (National Research University Higher School of Economics) - Presenting Author
The contemporary Russian nationalist organizations are trying to be partly hidden from strangers’ glances because they try to avoid persecution them by authorities. That is one of the reasons why it is hard for researchers to explore such hidden social actors. When a researcher appeal to the nationalist organizations’ representatives directly they tend to reject any access to their lives, activities, and even thoughts. On the other hand, the nationalist organizations’ representatives usually need to communicate with each other and to recruit novice proponents. For this purpose, the contemporaneity provides an effective tool such as the Internet. When nationalist organizations’ representatives use the Internet, they leave their footprints that may be explored besides the actors’ willingness. Leaning on these footprints as incoming data for regression modeling with interaction effects, it appears possible to penetrate the exploring organizations nature bypassing the mentioned barriers.
In other my papers, I showed that the sector of the contemporary Russian nationalist organizations is heterogeneous. They differ in many ways. Their ideologies differ in how they feel about the USSR, about Russian culture, ethnicity and a race, about secularity, Russian paganism and orthodoxy, about the Russian empire, about liberalism and paternalism in economics. Their activities differ in how they feel about applying physical force to their opponents, about physical training, about helping the political prisoners, about mass actions, about preparing legislative initiatives, about providing political information and political teaching to proponents.
This paper addresses the ways of constructing an explanatory model with high accuracy and quality which explain why this or that type of contemporary Russian nationalist organizations exploit this or that thematic area.
Miss Maria Rodionova (National Research University Higher School of Economics) - Presenting Author
Regressions are one of the most popular methods for analyzing sociological data and are frequently used by researchers in social science. But usually, the value of R-squared in sociological research is low (0,1-0,2) and does not satisfy the researcher. In the case of Intimate Partner Violence, when constructing a model, scientists encounter the statistical non-significance of those predictors that are directly related to the dependent variable, according to the theory.
The possible solution to the problem lies in constructing a regression with interaction effects. This method is especially effective when dealing with “composite” phenomena, which are explained by the complex connections of social factors among themselves.
The report discusses the specifics of such regression models when studying determinants of IPV. Since in most cases the study of both members of the dyad is difficult, and the data available to the researcher can only be aggregated, the construction of interaction effects is necessary to model the processes occurring in the relations of the two partners.
The subject of the study was the factors explaining the presence and level of IPV: the duration and stage of the relationships, the disposal of the partner’s income, education level and the income of the partners. The Сonflict Tactics Scale was used to measure the prevalence of violence. The constructing of the final model was preceded by the evaluation and selection of significant effects using CHAID, MANOVA, and a log-linear analysis.
The following results were obtained: building a regression with the interaction effects allows increasing R-squared more than three times (from 0,082 to 0,339) without adding other predictors to the model and changing the specification of the regression model. Thus, the IPV is connected with interaction effects of income and stage of relationships, opportunity to dispose