Skip to main content

A modification to geographically weighted regression



Geographically weighted regression (GWR) is a modelling technique designed to deal with spatial non-stationarity, e.g., the mean values vary by locations. It has been widely used as a visualization tool to explore the patterns of spatial data. However, the GWR tends to produce unsmooth surfaces when the mean parameters have considerable variations, partly due to that all parameter estimates are derived from a fixed- range (bandwidth) of observations. In order to deal with the varying bandwidth problem, this paper proposes an alternative approach, namely Conditional geographically weighted regression (CGWR).


The estimation of CGWR is based on an iterative procedure, analogy to the numerical optimization problem. Computer simulation, under realistic settings, is used to compare the performance between the traditional GWR, CGWR, and a local linear modification of GWR. Furthermore, this study also applies the CGWR to two empirical datasets for evaluating the model performance. The first dataset consists of disability status of Taiwan’s elderly, along with some social-economic variables and the other is Ohio’s crime dataset.


Under the positively correlated scenario, we found that the CGWR produces a better fit for the response surface. Both the computer simulation and empirical analysis support the proposed approach since it significantly reduces the bias and variance of data fitting. In addition, the response surface from the CGWR reviews local spatial characteristics according to the corresponded variables.


As an explanatory tool for spatial data, producing accurate surface is essential in order to provide a first look at the data. Any distorted outcomes would likely mislead the following analysis. Since the CGWR can generate more accurate surface, it is more appropriate to use it exploring data that contain suspicious variables with varying characteristics.


The data collected nowadays are diversified and many of them possess the records of locations, namely spatial data. Spatial regression is a popular tool for analysing the spatial data [2, 35] and the first-order stationarity is a common assumption, which means that the expected (mean) values are fixed at different locations. The error terms of spatial regression are usually not independent and, like in time series analysis, their covariance is assumed to follow some spatial models, such as the simultaneous autoregressive (SAR) and moving average (MA) models [12, 30, 34]. However, the first-order stationarity is a questionable assumption in practice and the modifiable areal unit problem (MAUP) often occurs [5, 13, 22]. The MAUP is a spatial version of the Simpson paradox, where the trends appearing in individual groups of data are different to those in the aggregate data. The biased estimates might be a consequence of the parameter values not being identical in the study area and the inclusion of data with different attributes.

Because the parameter values are not identical at different locations, estimation via the ordinary least squares (OLS) with all observations would likely distort the local distinctness. One possible solution is to include only the locations of data with similar attributes (i.e., homogeneity). However, it is difficult to decide the number of groups with different attributes and identify the locations of data in each group. Moreover, the mean value of a non-stationary process is usually a step function [8] or is continuous across space, and it is difficult to find the exact boundary of appropriate locations. The other possibility is to use the varying coefficient model [10], allowing the coefficient terms to vary according to locations. Then, the model is a form of local linear models [15] and can be used to explore the dynamic property of spatial data. Based on the concept of the varying coefficient model, geographically weighted regression (GWR) is modified to solve the MAUP [6].

The GWR allows the regression coefficients to vary across space, and the coefficient estimates of all variables are obtained from a moving data window, which is analogous to kernel regression for obtaining a smoothing estimate. It is also a popular tool for exploratory data analysis (EDA) on spatial data [19, 32]. In particular, the GWR is often a popular visualization tool in geographical information system, to explore possible patterns of a study region and acquire valuable information for further data analysis (such as clusters detection) [11, 36]. Note that the optimal width (or bandwidth) of the moving windows in a GWR is determined by cross-validation (CV) or Akaike’s information criterion (AIC) [16]. The OLS can be treated as a special case of the GWR with a window of infinite width (although the local distinctness is likely to be lost by averaging all observations).

Most of the modifications to the GWR are on the selection and testing of the bandwidth. For example, using the CV and AIC for the bandwidth selection is a data-driven method, similar to the kernel regression method, wherein the estimates are sensitive to outliers [16]. In addition, the data variations are not necessarily the same and a fixed bandwidth is likely to create discrepancy in parameters’ estimates at different locations. On the other hand, the hypothesis testing of the parameters depends on the bandwidth as well. For example, Leung et al. [21] proposed goodness-of-fit tests and found that the degree of freedom of GWR residuals is a function of the bandwidth, and this makes bandwidth selection somewhat subjective.

Determining the bandwidth probably is the focus of modifying GWR over the years. Brunsdon et al. [7] introduced a mixed GWR model with vector bandwidths, allowing the coefficients having different bandwidths (via a backfitting algorithm) and the bandwidths being functions of data density. Shi et al. [27] suggested the weight of data determined by their attributes, rather than by the distance between observations. Furthermore, Farber and Páez [16] found that it is possible to reduce the bias by modifying the CV procedure. Subsequently, Wang et al. [31] introduced local linear estimation, or a polynomial fitting technique, to reduce the bias in parameters’ estimates.

The reason for considering different bandwidth is that the GWR tends to produce ragged surfaces, in addition to biased estimates. Suppose the true surfaces are linear or ridged. As shown in Fig. 1, there are (false) hot spots and (false) cold spots in the GWR estimation, and they are especially obvious at edges and corners, where the true values and the estimated surfaces are in the first and second rows, respectively. The bias of GWR becomes larger for ridge surfaces (and other non-linear surfaces) and the GWR seems to provide misleading interpretations. The detailed discussions of GWR estimates are given later in this manuscript.

Fig. 1
figure 1

True surfaces and GWR mean surface. The figure compares the true surfaces and the estimated GWR respond surface (linear and ridge surface)

In this study, our focus is also on the bandwidth selection for each variable, using the correlations between independent variables. The idea of the proposed approach is to use the correlations to improve the estimation via an iteration algorithm, similar to the method of control variate in variance reduction [24]. The empirical analyses of GWR showed that correlations often exist between GWR coefficients. For example, Bivand and Brunstad [4] found the coefficients to be highly correlated in a case study. We found the rough coefficient surfaces can be smoother (Fig. 1) if the coefficients are positively correlated.

For the rest of this manuscript, we first introduce the GWR and the proposed modification of GWR, conditional GWR (CGWR), and its theoretical results. Then, we use simulation to evaluate the proposed method and compare it with the basic GWR and the local linear method proposed by Wang et al. [31]. In addition to simulations, we also apply the proposed method to two data sets for empirical study. Finally, we conclude with discussions on the limitations and future applicability of the proposed method.


The GWR models a dependent variable y via a linear function of a set of p independent variables, \(x_{1} ,x_{2} , \ldots ,x_{p}\), or

$$Y_{i} = \beta_{i0} + \sum\limits_{k = 1}^{p} {\beta_{ik} x_{ik} } + \varepsilon_{i}$$

where \(\beta_{ik}\) and \(x_{ik}\) are the parameters and observed values of the independent variable k \((k = 1, \ldots ,p)\) for observation i;\(\varepsilon_{i}\) is the error term for observation i, which is generally assumed to be from a normal distribution with zero mean and constant variance \(\sigma^{2}\)(i.e., \(\varepsilon_{i} \sim N(0,\sigma^{2} )\)). The subscript i represents the spatial location of the observation i \((i = 1, \ldots ,n)\). In other words, each location has its own regression model in the GWR model. The idea behind Eq. (1) is that nearby data of each location usually possess similar attributes. Thus, choosing an appropriate range (which is referred to as “bandwidth” in this study) is plausible to obtain a fine local regression.

The parameter set \(\varvec{\beta}_{\varvec{i}}\) of observation i is derived by matrix algebra, or

$$\widehat{\varvec{\beta}}_{\varvec{i}} = \left( {{\mathbf{X}}^{{\mathbf{T}}} {\mathbf{W}}_{{\mathbf{i}}} {\mathbf{X}}} \right)^{ - 1} {\mathbf{X}}^{{\mathbf{T}}} {\mathbf{W}}_{{\mathbf{i}}} {\mathbf{Y}}$$

where \(\hat{\varvec{\beta }}_{\varvec{i}} = (\hat{\beta }_{i0} ,\hat{\beta }_{i1} , \ldots ,\hat{\beta }_{ip} )^{T} ,{\mathbf{X}} = ({\mathbf{1}},\varvec{x}_{1} , \ldots ,\varvec{x}_{p} )^{T} ,\varvec{Y} = ({\text{Y}}_{1} , \ldots ,{\text{Y}}_{\text{n}} )^{\text{T}}\), and \({\mathbf{W}}_{{\mathbf{i}}}\) is (diagonal) weight matrix with its weight \(w_{ij}\) of row i and column j defined as:

$$w_{ij} = \exp \left[ { - \frac{1}{2}(d_{ij} /h)^{2} } \right]$$

As mentioned earlier, the bandwidth selection is generally calibrated by minimizing the CV score or the AIC. However, if the data locations are sparse in the study area, the distance-weighted kernel might not be appropriate due to insufficient information. Brunsdon et al. [8] introduced rank-based and k-nearest neighbourhood methods to deal with sparse data. In addition to the GWR, we also consider one of its modifications by Wang et al. [31]. It is a local linear approach, or a Taylor expansion version of the GWR, and is expected to have better fitting if the geographical surface is linear-shaped.

Using a single bandwidth in GWR is likely to create unsatisfactory estimates if the attributes of independent variables are not similar. For example, independent variables with larger variations require a larger bandwidth (and more observations). On the other hand, locally sampling (i.e. narrower bandwidth) is preferable over global sampling (i.e. wider bandwidth) for the areas with larger gradient changes. The concept is similar to widely recognized importance sampling [25], which assigns more sampling weight on informative area. Therefore, allowing different bandwidths for each independent variable seems to be a desirable modification to the GWR. Unfortunately, the varying bandwidths cannot be managed by the weighted least squares or Eq. (2). Brunsdon et al. [7] proposed a backfitting algorithm for selecting different bandwidths, but the selection of bandwidths is somewhat objective and it usually require lots of computation time.

In this study, we introduce an approach (Conditional GWR; CGWR) to determine the bandwidth for each independent variable by iteration, which is inspired by the vector bandwidth method of Brunsdon et al. [7] and the kernel smoothing method in the varying-coefficient model of Wu and Chiang [33]. For the proposed method, we adapt the ideas of the Generalized Addictive Model (GAM) and Jacobi iteration [17, 20, 23, 28] to determine the appropriate bandwidths. Using the format of the GAM, the GWR model can be re-expressed as

$$Y_{i} = f_{i1} + \cdots + f_{ik} + \cdots + f_{ip} + \varepsilon_{i}$$

where \(f_{ik} = \beta_{ik} \times x_{ik}\) and \(\beta_{ik}\) is the parameter coefficient of the variable k at location i. If \(f_{ik}\) is the intercept, then \(x_{ik}\) is set to 1. Then, we can use the Jacobi iteration to solve the Eqs. (4), one-by-one, for parameter \(f_{ik}\). Again, we assume that \(\varvec{f}_{k} \{ l\}\) denotes the lth iteration vector of \(\varvec{f}_{k}\), and \(\varvec{f}_{k} \{ l\}\) denotes an n × 1 vector composed of \(f_{k}\). Then, the proposed method can be summarized iteratively as follows:

  • Step 1. Set the initial solution of \(\varvec{f}_{k}\) to be zero, i.e. \(\varvec{f}_{k} \{ 0\} = {\mathbf{0}}\), where \(k = 1, \ldots ,p\), and let \(l = 1\).

  • Step 2. For each element \(\varvec{f}_{k} \{ l\}\), apply the basic GWR model with only one independent variable, \(x_{k} .\) The value of the dependent variable is \(\varvec{y}^{*} = \varvec{y} - \sum\nolimits_{\begin{subarray}{l} j = 1 \\ j \ne k \end{subarray} }^{p} {\varvec{f}_{j} \{ \varvec{l} - {\mathbf{1}}\} }\), i.e. we regress \(\left( {\varvec{y} - \sum\nolimits_{\begin{subarray}{l} j = 1 \\ j \ne k \end{subarray} }^{p} {\varvec{f}_{j} \{ l - 1\} } } \right)\) on the variable \(\varvec{x}_{k}\) without a fitting intercept. The bandwidth is obtained by minimizing the cross-validated sum of squares (CVSS) or the AIC.

  • Step 3. Repeat Step 2 until the given stopping criterion is reached.

There are at least two reasons for finding optimal bandwidth solutions individually using the Jacobi iteration. First, although more complex numerical methods (such as the quasi-Newton method) could be used, the Jacobi iteration usually requires less computation time. Second, although there are algorithms that converge faster than the Jacobi iteration, they are likely to produce biased estimates. For example, in the Gauss–Seidel iteration process, the estimate of one variable is updated based on the simultaneous estimates of other variables. If the estimates of some variables have severe biases, it might contaminate the estimates of other variables.

We think that the proposed estimation process can guarantee the convergence of the CGWR. In particular, if the bandwidth is predetermined during the iteration, then the GWR coefficients will converge to a constant for each location. We should use the case of two coefficients to demonstrate the convergence and the outline proof is given in Appendix A in Additional File 1. Note that the method of Brunsdon et al. [7] can be treated as a special case of the CGWR method when the bandwidths are never updated. In the next section, we will use computer simulation to evaluate the stability of the CGWR and compare it with basic GWR and its local linear modification by Wang et al. [31].

Results and discussion

Simulated data

The computer simulation is separated into two parts: scenarios without clusters and with a cluster. For the latter scenario, a cluster is added into the intercept to exhibit the mean shift intervention. The cluster scenario is to evaluate the performance of estimation methods under the influence of a systematic change (or hot spots) in space, such as sources of pollution. Moreover, the coefficients are assumed to follow one of the following four surfaces: linear, quadratic, ridge, or hillside, and these settings are to check which would cause raggedness in the estimated surfaces. For the former scenario, we also examine two types of surfaces: single-type and mixed-type. The difference between these two types of surfaces is whether the coefficients follow same type of surfaces (single-type) or different types of surfaces (mixed-type). We want to know if the coefficients follow different types of surface would cause biased estimation of coefficients.

To simplify the discussion, suppose there are only two coefficients, i.e. one intercept and one independent variable in the spatial regression, or

$$Y_{i} = \beta_{i0} + \beta_{i1} x_{i} + \varepsilon_{i} ,$$

where i is a natural number which indicates the location of the observation. Next, we define the signal versus noise ratio, i.e. the S/N ratio, where the signal represents the variations on the surface of the coefficients and the noise is the random fluctuation of observations. Larger S/N ratios are associated with larger variations in the coefficient surfaces, in which case the coefficient’s pattern is easier to be detected. In particular, we assume that signal = \(3 \times \left( {\frac{{\sum\nolimits_{i} {(\beta_{ik} - \bar{\beta }_{k} )^{2} } }}{n - 1}} \right)^{1/2}\) and the noise is equal to the standard deviation of the error term. Here, it is 0.5. The bias and variance of the estimates can be used together to evaluate the accuracy of the proposed CGWR. We define the average discount rate as follows:

$$\frac{{n^{ - 1} \sum\nolimits_{i} {(MSE_{method} )} }}{{n^{ - 1} \sum\nolimits_{i} {(MSE_{OLS} )} }},$$

where MSE refers to the mean square error, i.e. the sum of variance and the squared bias. Note that the MSE of the OLS estimate is used as a benchmark for comparison in Eq. (6) and it can be computed for all locations by using the weighted mean given by:

$$\frac{{\sum\nolimits_{i} {\left[ {(MSE_{OLS} )_{location \, i} \times \left( {MSE_{method} /MSE_{OLS} } \right)_{location \, i} } \right]} }}{{\sum\nolimits_{i} {(MSE_{OLS} )_{location \, i} } }} .$$

There are four types of surfaces for the coefficients, as shown in Fig. 2. Surfaces 1 and 2 are polynomial functions (related to linear functions) of the independent variables, and surfaces 3 and 4 are non-linear. Similar settings also appear in previous studies on GWR [30], and these have practical implications. For example, the quadratic surface (surface 2) often occurs in situations involving housing prices, where prices are significantly higher for locations near a town centre or a transportation centre [14, 31]. In addition, the relationship between the environmental factor and real estate price can be different in urban areas than in the countryside [9]. Furthermore, the relationship between disease and environmental factors might appear as non-linear on geographic surface. For instance, incidence rate of Dengue disease are highly correlated with population density but the relationship seems to fade out if there is proper disease prevention policy [26]. As a result, the surface of coefficients can be non-linear. For every non-stationary surface, we assume that there are 10 × 10 regular lattice points (i.e. 100 locations).

Fig. 2
figure 2

Four coefficient surfaces. The figure illustrates different kinds of surfaces in simulation settings (1 and 2 are Polynomial, and 3 and 4 are non-polynomial)

For a scenario without clusters, several cases are tested under different S/N ratios. The first case is the single-type surface, where both intercept and independent variable follow the same type of surface. The second case, namely mixed-type surface, assumes that the intercept and the independent variable follow different types of surfaces. For the second case, we only examine two combinations: linear-quadratic (a polynomial surface) and ridge-hillside (a non-polynomial surface).

For the scenario with clusters, we add two clusters in an intercept. The clusters are circular and occupy 18% of space in the study area. The circular assumption is quite usual in geographical studies, and literature proves that 10–20% of clustered area is a common phenomenon [29]. Two levels of mean shifts, such as 1σ and 2σ,are also added at the cluster locations. The simulation settings of both scenarios are in Table 1. For all scenarios, the errors are drawn from a normal distribution with a mean of 0 and a standard deviation of 0.5. The reason for choosing the standard deviation 0.5 is to incorporate the values of the S/N ratio. All results are based on 100 simulation runs.

Table 1 The scenario settings of simulation study

For CGWR, the Gaussian kernel is chosen, and the optimal bandwidth is the one with the minimum CVSS. Furthermore, we require a reasonable range of bandwidth to prevent the estimation from being too localized or globalized for extremely small or large bandwidths, respectively. The upper bound of the range is the maximum length on the map, and the lower bound has to have at least five data points each of 1/5th weight. The preceding setting is also used in the ‘spgwr’ package [3] (version 0.5–4) of R, a free statistical software.

We will first show the simulation results of the scenario without clusters. In particular, we will compare three GWR estimations with the smoothness of the mean surface, the average discount rate, average bandwidth, average variance, and average bias of estimates. The stopping criterion for the CGWR is reached when the average absolute relative change rate of \(\beta_{0}\) and \(\beta_{1}\) is less than 0.005% of the previous step. The simulation results are similar if adopting smaller stopping criteria.

To simplify the notation, we will use \(\beta_{0}\) and \(\beta_{1}\) to denote the coefficients of the intercept and slope of the independent variable x, respectively. In the case of single-type surfaces and mixed-type surfaces, the two coefficients are assumed to be perfectly positively correlated and close to uncorrelated, respectively. We have not considered the negative correlation because the proposed algorithm does not work when the coefficients are not positively correlated. Nonetheless, we will consider a two-stage modification for CGWR when the coefficients are not positively correlated.

Single-type surfaces

We first compare the smoothness of the three different GWR methods. For instance, the mean surfaces from 100 simulation runs for quadratic and hillside surfaces with the S/N ratio of \(\beta_{0}\) = \(\beta_{1}\) = 5 are shown in Figs. 3 and 4. Obviously, the CGWR produces the best fit, and the mean surfaces are almost identical to the true surfaces as shown in Figs. 2, 3, and 4. The GWR tends to produce bumpy surfaces, which are more ragged for the \(\beta_{1}\) surfaces. The edge effect of the GWR is obvious; this may be because fewer observations were used in the estimation. On the other hand, the local linear method tends to produce linear-like surfaces and provides distorted information for the non-linear surfaces and for the \(\beta_{1}\) surfaces. In contrast, the CGWR produces a remarkable fit even in complex surfaces and provides valuable information for further data analysis.

Fig. 3
figure 3

\(\beta_{ 0}\) Mean surface. Estimated intercept respond surface derived from different method (quadratic and hillside surfaces, S/N ratio of \(\beta_{ 0}\) = \(\beta_{1}\) = 5)

Fig. 4
figure 4

\(\beta_{ 1}\) Mean surface. Estimated intercept respond surface derived from different method (quadratic and hillside surfaces, S/N ratio of \(\beta_{ 0}\) = \(\beta_{1}\) = 5)

Table 2 shows the results of the discount rates in the case where \(\beta_{0}\) and \(\beta_{1}\) follow a linear surface. We can see that both the proposed CGWR and the local linear method have significant improvements over the basic GWR method. Interestingly, the local linear method is better (with respect to smaller discount rates) than the GWR when the S/N ratio is large, but the basic GWR is better when the S/N is small. The reason might be that larger noises produce larger fluctuations, and thus the average tangent line in the local linear method is inaccurate or unstable. Similar results are also found for the other three surfaces, as evidenced in Tables 3, 4, and 5. This suggests that the local linear method might not be very stable if the S/N ratio is small.

Table 2 Average discount rates on a linear surface (single-type)
Table 3 Average discount rates on a quadratic surface (single-type)
Table 4 Average discount rates on a ridge surface (single-type)
Table 5 Average discount rates on a hillside surface (single-type)

The CGWR and the local linear method again outperform the basic GWR method in the case of a quadratic surface. However, the CGWR appears to be the best, and the advantage is more obvious when the S/N is increased. For the non-linear surfaces, the CGWR continues to work satisfactorily, whereas the local linear model does not. In fact, the local linear model might even produce worse results than the basic GWR. The CGWR is still reliable for the non-linear surfaces, and it performs much better than the other two methods.

Intuitively, we expect that the bandwidth to be small if the S/N is large because distant observations can be very different and cause biased estimations. In general, all three GWR methods have significant drops in bandwidth when the S/N ratio increases from one to three. Moreover, the bandwidths for a linear surface should be larger than those for a non-linear surface under the same S/N ratio because the surface change is quite homogenous in any direction.

The bandwidth results can also be used to explain why the CGWR outperforms the other two methods. We will choose two surfaces (linear and hillside) to discuss these results. Table 6 shows the average bandwidths. The local linear method often yields larger bandwidths. If the true surface is close to linear, we can rely on observations within a larger bandwidth and, thus, have smaller variances than those for non-linear surfaces. Since the shape of a hillside is close to linear, the bandwidths in the case of the hillside are very similar to those in the linear case. They are also much larger than those of the quadratic and ridge cases. For more details, see Appendix B in Additional File 1.

Table 6 The average bandwidths for linear and hillside surfaces (single-type)

The bandwidths of the CGWR seem to be related to the signal strength. For example, if the S/N ratio is small, the bandwidth is expected to be large in order to provide a stable estimate. If we fix the S/N ratio of \(\beta_{1}\), then the \(\beta_{0}\) bandwidth of the CGWR decreases as the S/N ratio of \(\beta_{0}\) increases for all four surfaces (Appendix B in Additional file 1). Similar results hold for the \(\beta_{1}\) bandwidths if we fix the S/N ratio of \(\beta_{0}\). The simulation results of CGWR match our expectations.

The variances and biases of the estimates from the three GWR methods can also be used for making comparisons. Again, we will use the cases of linear and ridge surfaces for a detailed discussion. Further, because there are many combinations for the S/N ratios of \(\beta_{0}\) and \(\beta_{1}\), we have only shown the results when the S/N ratio equals one and five. The results are shown in Tables 7 and 8. Unlike the previous comparisons, we also provide the variances and biases of the OLS estimates. In general, a larger S/N ratio tends to produce a larger bias. Moreover, the OLS estimates fail to capture the spatial trend causing the largest bias, but it uses all the observations in the estimation (i.e. infinite bandwidth) and thus has the smallest variance. As for the three GWR estimations, the variances of the estimators are generally larger than the biases.

Table 7 Average variances and biases of \(\beta_{ 0}\) and \(\beta_{1}\) on a linear surface (single-type)
Table 8 Average variances and biases of \(\beta_{ 0}\) and \(\beta_{1}\) on a hill-side surface (single-type)

The results of the linear surface are in Table 7. As mentioned earlier, the average bandwidths of the local linear method are the largest, which possibly indicates the smallest variances. In addition, the local linear method has the smallest bias and the smallest discount rates for linear surfaces (Table 2). Although the CGWR has a larger bias than the local linear method in the case of linear surface, it dominates the basic GWR method with respect to both the variance and bias. The CGWR performs the best with ridge surfaces, outperforming the basic GWR and the local linear method with respect to both variance and bias.

Mixed-type surfaces

Next, we repeat the same comparisons for the three GWR estimation methods with the mixed-type surfaces. The results are similar to those in the single-type surfaces, and thus we will only show the results of the discount rates. As mentioned earlier, there are two cases in this scenario: linear-quadratic (a polynomial surface) and ridge-hillside (a non-polynomial surface). In the first case, the underlying surface of intercept is linear, and the slope is quadratic. In the second case, all surfaces are of non-polynomial type, and it is more complex than the first one.

Basically, the CGWR also has smaller discount rates than the basic GWR for the mixed-type surfaces (Tables 9, 10). We will focus on the results that differ from those of the single-type surfaces. Although the local linear estimation is better than the GWR for linear-quadratic surfaces, it performs adversely for the ridge-hillside surfaces. It is inadequate to use the linear fitting method to approximate non-linear surfaces, such as the ridge-hillside case. On the contrary (similar to the single-type cases) the CGWR dominates the other two methods in both cases.

Table 9 The average discount rates of a linear-quadratic surface (mixed-type)
Table 10 The average discount rates of a ridge-hillside surface (mixed-type)

Two-stage fitting procedure

We found that the CGWR works well when there is a positive correlation. However, in reality, there is a great likelihood of variables not being positively correlated. To overcome this difficulty, the CGWR can be modified into a two-stage process. In the first stage, we divide the variables into two groups. In both groups, the variables are non-negatively (or positively) correlated within the group. Any two variables are non-positively (or negatively) correlated if they are from different groups. We choose one group of variables and apply the basic GWR method to this group. In the second stage, we apply the CGWR method to the other group of variables by treating the first group of variables (chosen in the first stage) as constants.

We use an example to demonstrate the two-stage fitting. Let us assume there are two independent variables and an intercept. Let the coefficients of the two independent variables be negatively correlated. In other words, let the coefficients of variables x 1 and x 2 be negatively correlated, and the coefficients between x 1 and intercepts be positively correlated. We first apply the basic GWR on x 2 in the first stage, and then apply the CGWR on the intercept and x 1 in the second stage. We will use a simulation to evaluate the two-stage modification and show the results of the linear and hillside surfaces in Fig. 5. Similar to the previous simulation, the two-stage CGWR seems to work well even when the variables are not all positively correlated.

Fig. 5
figure 5

Average discount rates by \(x_{2}\). The average discount rates of additional explanatory variables during the simulation study. The baseline is GWR. The ratio of average discount rate under different Signal to noise ratio. Left panel indicate the performance on \(\beta_{0}\), and indicate performance on \(\beta_{ 1}\) for right panel

Single-Type Surfaces with Clusters

The purpose of considering the scenario with clusters is to investigate whether the estimated surface would be influenced by the cluster intervention on \(\beta_{ 0}\). Figure 6 illustrates the cluster location and the mean shift level. Single-type surfaces are assumed to evaluate the performance under cluster intervention. The mean smoothness and average discount rate of \(\beta_{ 0}\) are in Fig. 7 and Table 11. The CGWR again has the best performance and provides the most accurate information pertaining to the location and size of the clusters. Although the GWR seems to reveal the true cluster locations, it suffers from bumpy fitting and gives rise to ‘false clusters.’ The local linear method seems to oversmooth the surface and blur the local pattern, although this might suggest a possible cluster on the edges.

Fig. 6
figure 6

Cluster location and mean shift level. The area shows cluster intervention in simulation study. The shade depth in the first graph represents the mean shift level. The simulated region falls within Cartesian product within 0 and 1 (i.e. [0, 1] × [0,1]). There are two artificial clusters (hot-spots) located on lower-left and top-right. The lower-left cluster has lower relative risk and the top-right cluster has higher relative risk

Fig. 7
figure 7

\(\beta_{ 0}\) Mean surface in cluster settings. Estimated intercept respond surface derived from different method (quadratic and hillside surfaces, S/N ratio of \(\beta_{ 0}\) = \(\beta_{1}\) = 5)

Table 11 The average discount rates of \(\beta_{ 0}\) on surfaces with cluster

From the preceding computer simulations studies, we found that the proposed CGWR method makes a significant improvement over the basic GWR method. Although the local-linear method behaves well in the linear surface, if the coefficient surfaces are non-linear, the CGWR also outperforms the local linear method. In the following discussion, we will use two real-world datasets to compare the CGWR and other two methods and provide further evidence in support of the CGWR.

Empirical data

We apply the CGWR to two empirical data sets: the first is from the 2000 Taiwan Census and the other is the Ohio crime data provided by Anselin [1]. These two examples are designed to demonstrate that the CGWR would yield better estimation results. For the Taiwan data, our goal is to explore the relationship between the proportion of elderly disability and social factors. The elderly population in Taiwan have been increasing rapidly all around the country, while the medical resources still concentrate in the metropolitan areas (or northern Taiwan). Hu and Yue [18] applied spatial regression model to the elderly disability data of township level and found they are spatially auto-correlated. Brunsdon [7] argued that the spatial autocorrelation seems to be caused by spatial non-stationarity (i.e., identifiability). His claim motivates us to re-examine the data using the GWR-based model.

Taiwan data

Taiwan 2000 Census includes data of 350 townships and their proportions of disabled elderly are set as the dependent variable. Since this variable appears to be right-skewed, a log transformation (i.e.\(y_{i}^{*} = \log (y_{i} + 1)\)) is applied. Four independent variables are selected: the population density (POP), proportion of elderly (ELD), elderly mortality rate (EMR), and education level (EDU). These independent variables are standardized into the [0, 1] interval. Before applying the GWR, we first test spatial non-stationarity with the F test suggested by Leung et al. [21]. The F test shows that the model is spatially non-stationary with p value <0.001. This confirms the hypothesis of Brunsdon [7] and creates incentive for plugging the GWR-type analysis.

The correlation of the intercept and the variable POP is 0.463 (Table 12), and they are placed in the same group. Similarly, the variables ELD, EMR, and EDU are in the other group since they are positively correlated pairwise. Thus, we use the two-stage modification and apply the CGWR on the group of positively correlated variables (i.e. the intercept and the variable POP). First, we treat the variables ELD, EMR, and EDU as constants after obtaining estimates from the basic GWR method. Then, we apply the CGWR to the intercept and the variable POP as \(\left( {y_{i}^{*} - \hat{\beta }_{i2}^{GWR} ELD_{i} - \hat{\beta }_{i3}^{GWR} HMR_{i} - \hat{\beta }_{i4}^{GWR} EDU_{i} } \right) = \hat{\beta }_{i0}^{CGWR} + \hat{\beta }_{i1}^{CGWR} POP_{i} + r_{i}\). After fitting the CGWR, the calibrated bandwidths vary across the variables. Also, we set the lower and upper bounds of the bandwidth as 1 and 400 km, respectively.

Table 12 Correlations of regression coefficients from the disability data (INT and POP) → group 1, (ELD, EMR, EDU) → group 2

There is a noticeable difference between the estimates from the CGWR and those from other methods (Fig. 8). The coefficient surfaces of the local linear method appear to spread in the north–south or east–west direction with linear boundaries. Similarly, the surfaces of the basic GWR also show descending (or ascending) patterns but with curved boundaries. However, the intercept of the CGWR shows clusters (or concentration) of high disability rate on inland (mountainous areas). For the variable POP, the number of coefficient levels varies among different models and their spreading directions are not identical. The GWR has the fewest levels and the local linear method has the largest; the spreading is in the east–west direction for the local linear method, different to other methods.

Fig. 8
figure 8

Surface of intercept and variable POP for different GWR methods. Comparison of different method by using Taiwan disability data

We also use the pseudo R-square values and the residual plots for model evaluation (Fig. 9). The pseudo R-square is the Pearson product moment correlation coefficient of the fitted value and the observed value; a large value usually indicates a better fit. The pseudo R-square value of the CGWR is 0.894, largest among three methods. Moreover, the residual plots are also in favour of the CGWR because there are fewer outliers, and the CGWR appears to have smaller variance. Except for one observation (standardized residual larger than 3), the residuals histogram of CGWR (350 observations) looks more symmetric and less skewed to the right than those of the basic GWR and local linear method. It should be noted that either one of the variable sets can be chosen as a constant. If we apply the CGWR procedure to the other group of variables (i.e. ELD, EMR, and EDU), then the CGWR has a better fit (although the pseudo R-square is slightly low at 0.874).

Fig. 9
figure 9

Residual plots for different GWR methods (Taiwan data). These plots illustrate the residual analysis after fitting with different model. The residuals are generated from Taiwan disability data

Ohio Data

The Ohio data is the Ohio crime data (which can be found on ‘spgwr’ package) with the information of 49 neighborhoods including the crime per inhabitant, average income values, and average housing costs. In this study, we define the crime per inhabitant as the dependent variable and the rest of them as the predictors. First, we fit the data with the GWR model. By cross validation criterion, the optimal bandwidth is 2.27 (Table 13). And yet, Leung et al. F test [21] suggests none of the variable is non-stationary. Therefore, the OLS analysis is applied and one observation is considered as outlier and removed accordingly.

Table 13 The bandwidths of different models

Next, we re-fit the data with the CGWR and compare the estimation result with those of the OLS, the GWR, and the local linear models. Table 14 lists the pseudo R-square and the p value of normality test (Kolmogorov–Smirnov test) for residuals, and Fig. 10 shows the residuals plot. Overall, the CGWR yields the best performance in estimation and produces a more reliable result. For other methods, none of them gives satisfactory estimates. For example, despite the GWR produces a large pseudo R-square, its residuals are not normally distributed and its variance is likely not constant. The OLS is also not a feasible model, judging from the information of normality test and constant variance.

Table 14 Pseudo R2 and p values of Kolmogorov–Smirnov normality test
Fig. 10
figure 10

Residual plots of various models (Ohio data). Different residual plot from Ohio crime data. Note that the data are cut into half according to fitted values before conducting F test. The data are cut into half according fitted values before conducting constant variance test. For example, the middle point of GWR fitted values is around 29.1. We first split the data to two sets (before 29.1 and after 29.1). After that, F test is conducted to test whether both parts have equal variance (i.e. \({\text{H}}0:\upsigma^{1} =\upsigma^{2}\)), which is the basic assumption in regression analysis. Small p value indicates constant variance assumption is not likely to be true


The GWR has become a popular tool for explanatory data analysis and detecting spatial non-stationarity ever since its introduction. The GWR provides useful information for data analysis, especially helpful in deciding important explanatory variables. This technique allows regression coefficients to vary across space and obtains their estimates from a bandwidth of observations according the data attribute. However, the GWR tends to produce ragged surfaces (as shown in Fig. 1), and a fixed bandwidth may not be appropriate since the independent variables are necessary to be homogeneous (e.g., their variations can be quite different). In this study, we proposed a modification to the GWR, namely CGWR, which allows the group of positively correlated independent variables to have its own bandwidth via an iterative calibration process.

We used computer simulated and empirical data to compare the proposed method with the GWR and its local linear modification by Wang et al. [31]. Based on the simulation results, we found that the CGWR outperforms other two methods, with respect to the bias and variance, when the regression coefficients are positively correlated. The advantage is especially noticeable in the case of non-linear surfaces. In particular, the clusters have little influences on the estimation of CGWR. The results of empirical studies also support the CGWR and it generally has larger R-square and has fewer extreme outliers (e.g., the absolute value of standardized residual larger than 2 or 3) than the GWR and the local linear method.

However, the proposed method has its limitations. First, probably the most critical limitation, the current setting of CGWR only works for the case if there are independent variables with positive correlation. Although not shown here, we found that the CGWR does not work well in the case of independent variables with negative correlation. It is like the antithetic variate in variance reduction of Monte Carlo Integration. Antithetic variate is one of the popular variance reduction methods but it only works when two variables are negatively correlated [25]. Thus, we suggest first calculating the correlation coefficients between independent variables. Then, form a group of variables which are pairwise positively correlated and apply the CGWR only to this group of variables. Another possibility is that the independent variables often can be separated into two groups and variables within/between groups are positively/negatively correlated, as seen in Taiwan 2000 Census data. We can apply the two-stage CGWR to two groups of variables.

Second, the CGWR is a computer-intensive method and its computing time increases rapidly as the number of variables increases, although the convergence of coefficients can be speed up by using the moving average method. Third, the CGWR is not guaranteed to work if there are many variables, and so far it is effective for the case up to four variables. A possible modification to the case with more variables would be to separate the variables into two groups and use double iteration. Then, the CGWR can be applied to each group of variables forming the inner loop and the process re-iterated between the two groups forming the outer loop until both groups of variables converge. To demonstrate the feasibility of this idea, we also conducted an experiment with six variables, separating them into two groups of three variables each. We found that the estimation did converge and produce satisfactory estimates.

In addition to the fixed bandwidth, it seems that there is still room for improvement about the GWR. In particular, when the S/N ratio is small, the estimated coefficient surfaces would be non-linear (i.e., ragged surfaces), even when the true surfaces are linear. In addition, the variance reduction of the CGWR over the GWR is more obvious than that for bias reduction. This indicates that the GWR estimates have large variance when the S/N ratio is small. In other words, if the variances of GWR estimates are reduced, the bias can also be further reduced, producing more stable estimates.


  1. Anselin L. Spatial econometrics: methods and models. Dordrecht: Kluwer Academic Publishers; 1988.

    Book  Google Scholar 

  2. Auchincloss AH, Gebreab SY, Mair C, Diez Roux AV. A review of spatial methods in epidemiology, 2000–2010. Annu Rev Public Health. 2012;33:107–22. doi:10.1146/annurev-publhealth-031811-124655.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Bivand R, Danlin Y. R manual for package ‘spgwr’. 2010. Accessed 01 Mar 2017.

  4. Bivand R, Brunstad R. Further explorations of interactions between agricultural policy and regional growth in Western Europe: approaches to nonstationarity in spatial econometrics. 2005. Accessed 01 March 2017.

  5. Bratti M, De Benedictis L, Santoni G. On the pro-trade effects of immigrants. Rev World Econ. 2014;150(3):557–94. doi:10.1007/s10290-014-0191-8.

    Article  Google Scholar 

  6. Brunsdon C, Fotheringham AS, Charlton M. Geographically weighted regression: modelling spatial nonstationarity. Statistician. 1998;47:431–43. doi:10.1111/1467-9884.00145.

    Google Scholar 

  7. Brunsdon C, Fotheringham AS, Charlton M. Some notes on parametric significance tests for geographically weighted regression. J Reg Sci. 1999;39:497–524. doi:10.1111/0022-4146.00146.

    Article  Google Scholar 

  8. Brunsdon C, Fotheringham AS, Charlton M. Geographically weighted regression: the analysis of spatially varying relationships. UK: Wiley; 2002.

    Google Scholar 

  9. Cho S, Lambert DM, Kim SG, Jung S. Extreme coefficients in geographically weighted regression and their effects on mapping. GISci Remote Sens. 2009;46(3):273–88. doi:10.2747/1548-1603.46.3.273.

    Article  Google Scholar 

  10. Cleveland WS, Grosse E, Shyu WM. Local Regression Models. In: Chambers JM, Hastie TJ, editors. Statistical models in S. Pacific Grove: Wadsworth & Brooks; 1991. p. 309–76.

    Google Scholar 

  11. Comber A, Fisher P, Brunsdon C, Khmag A. Spatial analysis of remote sensing image classification accuracy. Remote Sens Environ. 2012;127:237–46. doi:10.1016/j.rse.2012.09.005.

    Article  Google Scholar 

  12. Cressie N. Statistics for spatial data. New York: Wiley; 1993.

    Google Scholar 

  13. Dark SJ, Bram D. The modifiable areal unit problem (MAUP) in physical geography. Prog Phys Geogr. 2007;31:471–9. doi:10.1177/0309133307083294.

    Article  Google Scholar 

  14. Du H, Mulley C. Understanding spatial variations in the impact of accessibility on land value using geographically weighted regression. J Transp Land Use. 2012;5(2):46–59. doi:10.5198/jtlu.v5i2.225.

    CAS  Google Scholar 

  15. Fan JQ, Zhang WY. Statistical methods with varying coefficient models. Stat Interface. 2008;1:179–95.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Farber S, Páez A. A systematic investigation of cross-validation in GWR model estimation: empirical analysis and monte carlo simulations. J Geogr Syst. 2007;9:371–96. doi:10.1007/s10109-007-0051-3.

    Article  Google Scholar 

  17. Hastie TJ, Tibshirani RJ. Generalized additive models. London, New York: Chapman and Hall; 1990.

    Google Scholar 

  18. Hu YW, Yue JC. Spatial analysis of Taiwan elderly disability. Technical report, Department of Statistics, National Chengchi University, Taipei, Taiwan R.O.C; 2002. Accessed 01 March 2017.

  19. Kauhl B, Schweikart J, Krafft T, Keste A, Moskwyn M. Do the risk factors for type 2 diabetes mellitus vary by location? A spatial analysis of health insurance claims in northeastern germany using kernel density estimation and geographically weighted regression. Int J Health Geograph. 2016;15:68. doi:10.1186/s12942-016-0068-2.

    Article  Google Scholar 

  20. Lou Y, Caruana R, Gehrke J. Intelligible models for classification and regression. In: Proceedings of the 18th ACM SIGKDD international conference on KDD. New York: ACM, 2012. P. 150–8.

  21. Leung Y, Mei CL, Zhang WX. Statistical tests for spatial nonstationarity based on the geographically weighted regression model. Environ Plann A. 2000;32:9–32. doi:10.1111/j.1538-4632.1996.tb00936.x.

    Article  Google Scholar 

  22. Ma YZ. Simpson’s paradox in natural resource evaluation. Math Geosci. 2009;41(2):193–213.

    Article  Google Scholar 

  23. Manoranjan VS, Olmos GM. A two-step Jacobi-type iterative method. Comput Math Appl. 1997;34:1–9. doi:10.1016/S0898-1221(97)00093-X.

    Article  Google Scholar 

  24. Ripley BD. Stochastic simulation. New York: Wiley; 1987.

    Book  Google Scholar 

  25. Rubinstein RY, Kroese DP. Simulation and the monte carlo method. New York: Wiley; 2011.

    Google Scholar 

  26. Seng SB, Chong AK, Moore A. Geostatistical Modelling, Analysis and Mapping of Epidemiology of Dengue Fever in Johor State, Malaysia, (pp. 109–123). Presented at the 17th Annual Colloquium of the Spatial Information Research Centre (SIRC 2005: A Spatio-temporal Workshop); 2005.

  27. Shi HJ, Zhang LJ, Liu JG. A new spatial-attribute weighting function for geographically weighted regression. Can J For Res. 2006;36:996–1005.

    Article  Google Scholar 

  28. Thisted R. Elements of statistical computing. London: Chapman-Hall; 1988.

    Google Scholar 

  29. Tango T, Takahashi K. A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geograph. 2005;4:11. doi:10.1186/1476-072X-4-11.

    Article  Google Scholar 

  30. Ward MD, Gleditsch KS. Spatial regression models. Thousand Oaks, CA: Sage; 2008.

    Book  Google Scholar 

  31. Wang N, Mei CL, Yan XD. Local linear estimation of spatially varying coefficient models: an improvement on the geographically weighted regression technique. Environ Plann A. 2008;40:986–1005.

    Article  Google Scholar 

  32. Wheeler DC. Geographically Weighted Regression. In: Fischer MM, Nijkamp P, editors. Handbook of regional science. Heidelberg: Springer; 2014. p. 1435–59.

    Chapter  Google Scholar 

  33. Wu CO, Chiang CT. Kernel smoothing on varying coefficient models with longitudinal dependent variable. Stat Sinica. 2000;10:433–56.

    Google Scholar 

  34. Wang TC, Yue JC. Spatial clusters in a global-dependence model. Spat Spatio Temporal Epidemiol. 2013;5:39–50.

    Article  CAS  Google Scholar 

  35. Wei Y, Ye X. Urbanization, urban land expansion and environmental change in China. Stoch Env Res Risk Assess. 2014;28:757–65.

    Article  Google Scholar 

  36. Zou B, Pu Q, Muhammad B, Weng Q, Zhai L, Nichol J. High-resolution satellite mapping of fine particulates based on geographically weighted regression. IEEE Geosci Remote Sens Lett. 2016;13:495–9. doi:10.1109/LGRS.2016.2520480.

    Article  Google Scholar 

Download references

Authors’ contributions

Leong developed the estimation algorithm, conducted simulation studies, as well as empirical data analysis, and drafted the manuscript. Yue proposed the general methodology, designed the simulation, interpreted the results and helped to draft the manuscript. Both authors read and approved the final manuscript.


This research was supported in part by a grant from the National Science Council in Taiwan, NSC 104-2410-H-004 -023 -MY2. We greatly appreciate the insightful comments from the editor and two anonymous reviewers, which helped us to clarify the context of our work.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The Columbus Ohio crime data is prepared by Luc Anselin, Spatial Analysis Laboratory, Department of Agricultural and Consumer Economics, University of Illinois, Urbana-Champaign. The dataset can be downloaded at following link:, or available at the ‘spgwr’ package in R statistical software.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jack C. Yue.

Additional file

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Leong, YY., Yue, J.C. A modification to geographically weighted regression. Int J Health Geogr 16, 11 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Geographically weighted regression
  • Modifiable areal unit problem (MAUP)
  • Generalized additive model
  • Computer simulation
  • Cross validation