Quantitative Spatial Analysis of Micro and Macro Data: Methodological Challenges and Solutions 3 |
|
Session Organisers | Professor Henning Best (TU Kaiserslautern) Dr Tobias Rüttenauer (TU Kaiserslautern) |
Time | Thursday 18th July, 16:00 - 17:30 |
Room | D25 |
The session intends to bring together methodological experiences made when working with spatial data in quantitative empirical social research. On the one hand, spatial data offer the opportunity to investigate the relationship between regional characteristics on the macro level. On the other hand, spatial data can be used to enrich survey data with structural information on a certain regional level, either to control for context effects or to explicitly analyse these effects and their interplay with mechanisms on the individual level. By using GIS, addresses of survey participants can be linked with objective measures of their neighbourhood (e.g. pollution data) or proximity to institutions (e.g. of educational institutions or workplaces). Thus, these data allow investigating the relevance of infrastructure distances for social action as well as processes of spatial spillovers and diffusion.
In doing so, several methodological questions arise: What kind of regional level is adequate to what kind of question and how does the choice of administrative borders influence the derived conclusions (“MAUP”)? Can we enrich survey data by information on actual travelling times and means of transportation to account for the moving or action space of participants? What are the challenges and limitations of these approaches and how can it be done reliably?
Furthermore, innovative statistical methods are necessary to adequately analyse spatial data. Various regression models (e.g. SAR, SARAR, SLX, Durbin and others) address the spatial dependence in different ways and offer alternative approaches to identify different types of spatial spillovers or spatial interdependences, in cross-sectional and longitudinal data. Which types of models are adequate for which type of questions? Which models can be used to simultaneously analyse individual and aggregate data?
In sum, in this session we are especially interested in methodological and applied studies dealing with topics of:
1. Choice of adequate regional level and handling of borders when using administrative data
2. Connection of individual data and spatially aggregate as well as infrastructural data
3. Spatial analysis of time-series and cross-sectional data
4. Modelling spatial relationships (e.g. commuting flows, distances, traveling times, social interactions)
5. Modelling spatial interaction, spillover or diffusion processes
6. Further challenges and solutions when using georeferenced data
Keywords: spatial data, geodata, geo referencing, GIS
Dr Christoph Zangger (University of Zurich) - Presenting Author
Missing data is a common concern to researchers in the social sciences. Whereas this issue has received increasing attention in sociological research based on survey data (Allison, 2001; Rubin, 1987), researchers face additional challenges when addressing missing data in spatial econometric models. Unlike in the non-spatial case, ignoring missing data is not only a problem in terms of introducing bias when the missing process is at random (`selection on observables'; MAR) or not at random (the reason of missingness is related to missing values; MNAR – Allison, 2001), but also if missings are completely at random (MCAR – Kelejian and Prucha, 2010). Due to the interdependence between units and the corresponding spatial multipliers, ignoring missing data introduces bias in spatial econometric models in any of the three cases (LeSage and Pace, 2009; Wang and Lee, 2013).
Using Monte-Carlo simulations, this paper addresses the outlined problem by means of a Bayesian framework. It can be demonstrated how the amount of bias introduced in parameter estimates is almost independent of the nature of the missing process, although it is marginally lower in the case of missings completely at random. Additionally, the amount of bias generally increases with the strength of the underlying spatial association across all different specifications. Finally, allowing for the simultaneous imputation of missing values and model estimation, the pursued Bayesian approach – although computationally intensive – offers an adequate framework to address this bias introduced in models with spatially lagged variables (and/or errors).
Mr Tobias Rüttenauer (TU Kaiserslautern) - Presenting Author
Spatial regression models provide the opportunity to analyse spatial data and spatial processes. Yet, several model specifications can be used, all assuming different types of spatial dependence. This study summarises the most commonly used spatial regression models and offers a comparison of their performance by using Monte Carlo experiments. In contrast to previous simulations, this study evaluates the bias of the impacts rather than the regression coefficients and additionally provides results for situations with a non-spatial omitted variable bias. Results reveal that the most commonly used spatial autoregressive (SAR) and spatial error (SEM) specifications yield severe drawbacks. In contrast, spatial Durbin specifications (SDM and SDEM) as well as the simple SLX provide accurate estimates of direct impacts even in the case of misspecification. Regarding the indirect `spillover' effects, several - quite realistic - situations exist in which the SLX outperforms the more complex SDM and SDEM specifications.
Professor I Gede Nyoman Mindra Jaya (Universitas Padjadjaran) - Presenting Author
Spatiotemporal disease models and mapping techniques are used to estimate the geographical evolution of disease risk. They are applied for nonoverlapping regions over a fixed period. They are aimed at identifying regions that have a high or increasing disease risk over time thus demanding public intervention. s. However, these techniques often smooth over large discontinuities in the risk surface which might be not realistic for many cases. The interest in developing the model which provides a balance between smoothness and allow for discontinuities already increased. To maintenance the discontinuity, some models have been proposed. The models were developed by portioning the region units into a set of disease risk clusters that exhibit substantially different risk. However, the models ignore the risk factors and non-stationarity regression problem where the pure models were used. The risk factors are important to be included in the model to determine the most significant factors affecting the high number of cases in high-risk clusters. Meanwhile, the non-stationary regression problem also important to be considered because of the effect of the risk factors could be different depending on areas within the spatial and temporal domains. The two or more high-risk clusters may be generated by the different spatiotemporal generating process. Fixed effect and random coefficients models are proposed to model the spatially non-stationarity problem. Therefore, this paper proposes a new modeling approach for clustering spatiotemporal disease risk data, considering the risk factors and spatially non-stationarity problem based on the random coefficient model.