To analyze both effects of individual and neighborhood factors on individual health outcomes, many previous health-related studies utilized multilevel models that can analyze two- (or more) level independent variables in tandem [1–6]. These studies analyzed various health outcomes, such as infant mortality , a low birth weight , preterm birth , late-stage breast cancer , children’s health-related quality of life , and tuberculosis incidence , using aggregated data in common, such as county-level, census tract-level, and postal code-level data to represent neighborhood-level variables. The studies, however, do not take into account underlying spatial dependency across neighborhoods; thus their multilevel analyses results are potentially misleading in cases where data exhibit spatial dependency. Spatial dependency in health-related data indicates that health outcomes in nearby neighborhoods are more similar to each other than to those in distant neighborhoods. In other words, these studies only consider within-neighborhood correlation (i.e., correlation between individuals within the same neighborhood) using a hierarchical setting, but fail to account for potential between-neighborhood correlation.
According to Jerrett et al. , spatial dependency of health outcomes among nearby neighborhoods may arise from similar socioeconomic (e.g., health facilities and services) and natural environmental conditions (e.g., air quality). For example, catchment areas for health facilities may encompass a broader area, thereby transcending localized administrative boundaries. In terms of local environment, disease risks from air pollution tend to be similar among closer neighborhoods because their local wind direction and/or road conditions (and environmental and traffic policies) are more likely to be similar; as a result, residents of those neighborhoods are exposed to similar types and concentrations of atmospheric pollutants [7–9]. However, the non-spatial multilevel model cannot address this spatial dependency because the method typically assumes that neighborhoods (i.e., spatial units) are statistically independent of each other ; thus multilevel models have been criticized as non-spatial and unrealistic [10–13].
Based on the notion of spatial dependency of health outcomes, some researchers used both a non-spatial single-level linear model ignoring spatial dependency (i.e., linear models estimated with ordinary least squares or weighted least squares) and a spatial autoregressive model (SAR) considering spatial dependency, and compared the two methods [9, 14]. The authors found that non-spatial single-level models and the SAR models provided different regression results depending on the presence of spatial dependency. These two studies, however, made limited attempts to model individual characteristics when using spatial models, because they used only aggregated variables. Studies that analyze health outcomes solely via aggregated data using a single-level spatial model cannot fully explain factors that truly influence individual health outcomes .
A few researchers have tried to incorporate a geographical perspective into the multilevel setting in various ways to take into account both the multilevel framework and spatial effects. Some studies attempting to address spatial dependency in residuals of multilevel models employed spatial lag regression model specifications [16, 17]. In the spatial lag regression model, the spatial autoregressive parameter is denoted as ρ, which indicates the intensity of spatial dependency. Another study  used multilevel models with geographically weighted regression (GWR) developed by Fotheringham et al.  to consider a spatially varying relationship between neighborhood factors and obesity. GWR allows researchers to estimate varying regression parameters over space. However, in some cases, there can still be spatial dependency after GWR is used, although this method may mitigate spatial dependency by considering spatial variation to some degree; this can influence the regression results considerably. In addition, according to Wheeler and Tiefelsdorf , GWR’s R2 goodness of fit tends to be high when residuals have high spatial dependency. Therefore, GWR should be used as an exploratory tool for understanding spatial variation rather than a statistically stable method for addressing spatial dependency.
As discussed above, limited attention has been paid within the literature to integrating multilevel models and spatial regression models. However, these two approaches should be used in combination because the objectives of both methods are important in health-related analyses. Thus, it is increasingly necessary to integrate multilevel models and spatial regression models, especially the eigenvector spatial filtering method, an advanced approach to addressing spatial dependency in datasets. Compared to spatial lag regression (or SAR) model specifications, which present only one parameter of global spatial component, the greatest advantage of eigenvector spatial filtering used in this paper is to visualize a spatial structure in a map form by decomposing it into smaller-scale spatial patterns or local clusters with a set of eigenvectors [20, 21]. This trait could provide a better understanding of how health phenomena are distributed across the space. Additionally, because the spatial filtering technique can be applied to a generalized linear model specification based on the binomial or Poisson probability models, it is more flexible than the spatial lag regression (or SAR) model, which requires normalizing factor computation . Compared with GWR, which has an inherent problem of multicollinearity among local regression coefficients , the spatial filtering method is more statistically reliable because eigenvectors generated in filtering procedure are mutually orthogonal, which indicates the absence of multicollinearity issues.
Griffith’s study  showed the possibility of combining hierarchical generalized linear models with spatial filtering method as a disease mapping technique. Based on this idea, the present study presents how multilevel modeling components can be linked to the spatial filtering framework by showing an integrated formulation and uses self-rated health status in South Korea to investigate whether an integrated “spatially filtered multilevel model” generates a more robust regression results than a conventional multilevel model.
This study first identifies whether spatial dependency exists within neighborhood-level residuals in the multilevel model. Where spatial dependency is detected, the eigenvector spatial filtering technique is applied to the multilevel model to control for spatial dependency. The study then compares the explanatory power of the models and the regression results between the conventional model and the spatially filtered model.