### Methodology

The Poisson model applied in this paper is a particular application of hierarchical Bayes spatial generalized linear models for the exponential family of likelihoods [20]. Particular models are specified by the likelihood that is assumed to give rise to the observations, the structure of the prior, and the hyperprior distributions of variance components, which are typically vague to allow learning from the data. An aspect of these models that will influence outcomes is the neighborhood weights, as in Equation (4); however, defining these weights remains an open area of research.

Most applications to date use the first order binary weighting scheme where *w*_{
ij
}= 1 if a mapping unit *j* shares a common border with unit *i*, and *w*_{
ij
}= 0 otherwise. This weighting scheme actually has its roots in image analysis, for which this type of modelling was developed [14], and it makes sense when the spatial units are equal size and shape pixels and the response variable has a constant variance for each pixel. However, this is not the case when smoothing disease maps where the mapping units are of irregular size and shape, and stability of the response variable estimates varies with changing population size. This problem is especially relevant for mapping units like ZIP codes.

A proper approach may be to define weights as a decay function of geographic distance between population-weighted centroids of the mapping units. This function may be obtained by fitting a model correlogram to residuals that are obtained from a model that does not include a random effect to account for spatial autocorrelation. Cressie and Chan [21] used the empirical variogram of the response variable to determine the range of spatial autocorrelation. For neighborhood distances within this range, weights were defined as a function of Euclidean distance. Meanwhile, Griffith [22] provides some "rules of thumb" for defining geographic weights, but they are very general. Ferrandiz, et al [23] used a weight of *n*_{
i
}*n*_{
j
}/*d*_{
ij
}for neighboring mapping units separated by geographic distance *d*_{
ij
}and of population sizes *n*_{
i
}and *n*_{
j
}. These authors applied such a weight to prostate cancer mortality mapping; however, this gravity-type weighting may be better suited for infectious disease, not chronic disease.

If a decay function is fit from the data, the varying stability of disease rates among the mapping units presents a challenge. Since the Bayesian model is designed to adjust for varying stability, perhaps a hierarchical model like the one applied in this paper can be extended so that the weights in Equation (4) are defined as a decay function whose unknown parameters can be assigned "hyperprior" distributions.

### Application to prostate cancer mapping

Maps like in Figure 1 present a compromise between the need to protect personal privacy and public demand for fine geographic resolution. Such small mapping units are necessary for discerning among communities that can vary drastically across a region with respect to possible risk factors and both population density and demographics. However, this comes with the cost of unstable risk estimates for many mapping units that have small populations. Smoothing is therefore applied to help visualize spatial pattern that is inherent in the data of Figure 1. It is demonstrated how hierarchical Bayes spatial modelling has the appealing feature of providing a whole distribution of possible outcomes that can be used for not only smoothing, but also to explore other aspects of spatial pattern.

Viewing Figures 1 through 3 indicate that the Bayesian model is behaving as expected since the smoothed estimates are increasingly dependent on the prior model as uncertainty increases due to decreasing population, whereas for ZIP codes with large populations, like in New York City, the smoothed estimates are similar to the raw SIRs.

We also see that many edge ZIP codes in less populated areas tend to have greater uncertainty relative to their non-edge neighbors because there are fewer neighbors to borrow strength from. For the heterogeneous Poisson model applied in this paper, Lawson et al [24] suggest treating edge mapping units as a guard and not as part of the actual study area. However, presenting the width of Bayesian posterior distributions provides a way to retain the edge units while also showing the relative uncertainty associated with their smoothed values.

The smoothed pattern in Figure 2 is highlighted for areas of high and low incidence in Figures 4 and 5, respectively. Geographic patterns seen in these maps are potentially influenced by many factors, including differences between regions of the state in terms of racial, ethnic and socio-demographic composition. Yet it is well recognized that interpreting any possible relationships with risk factors is confounded by differences in screening and diagnostic practices across the state. Prostate-specific antigen (PSA) testing remains high among US males over 40 [25] and there is evidence of a steady increase in testing rates in New York State during the years corresponding to the data analyzed in this paper [26]. Along these lines, we note that a relatively large proportion of the four most populated counties of New York City reveal a high probability of less than expected incidence (see Figure 5 inset). This may be partially explained by the large immigrant population in these four counties, as indicated by much higher proportions of people who are foreign-born and/or do not speak English at home [27], which may translate to lower screening rates.

Although the patterns seen in Figures 2, 4 and 5 may be partially explained by geographic variations in PSA testing and diagnostic follow up, such variation is not actually known, therefore we cannot adjust for this potential confounder. In the neighboring state of Connecticut, geographic variation of invasive prostate cancer incidence was large and revealed some consistency before and after the introduction of PSA testing, while the pattern was completely different and variation was much smaller during the years of PSA introduction [28]. These authors suggest that such a space-time pattern reflects the impact of introducing PSA testing, although this cannot be confirmed in the absence of data on geographic differences in PSA use.

Some areas appear elevated that are in popular vacation spots, such as the eastern forks of Long Island and the "north country" of New York State including the Thousand Islands area along the Saint Lawrence River (near Watertown in Figure 1) and the Adirondack region. This may possibly be due to a seasonal residence effect whereby vacation areas tend to have artificially inflated chronic disease rates [29]. This occurs when seasonal residents provide a health care provider with a local address of a vacation home, while their primary residence is where they are counted by the decennial US census. Consequently, if their residence at time of diagnoses is in the vacation area, this record inflates the SIR numerator for that area, while they are counted in the denominator for the area of their primary residence as captured by the census. This effect is enhanced since the population spending extended periods in vacation areas tends to be over age 55, which is the age cohort at highest risk of chronic disease. Boscoe and Mclaughlin [29] have presented evidence of increased overall cancer rates in areas with seasonally resident populations in New York State, especially the Thousand Islands area. This uncertainty is reflected in Figure 3 where the width of the posterior distribution of SIRs for these areas is relatively large.

While the smoothed results in Figure 2 present an advantage over mapping raw data, a limitation of smoothing is that the pattern we decipher is subject to confounding by spatially varying population sizes [30]. In other words, smoothed maps like in Figure 2 reveal high and low relative risk in areas with larger populations, while areas with small populations tend to be smoothed towards the null value. While this means that areas with small populations that actually have abnormally high or low disease rates may be obscured, it is still well recognized that many extreme values associated with small populations may simply reflect random noise.

Other methods like the spatial scan statistic can be used in conjunction with Bayes smoothing to strengthen overall spatial analysis. Statistically significant scan statistic circles like those in Figure 1 can vary in size, potentially encompassing many ZIP codes, so are not restricted to only pre-defined neighborhoods like the conditional autoregressive model used by the Bayes smoothing algorithm. In this regard, the spatial scan statistic is similar to smoothing by spatial filtering with variable-radius circles [10]. General regions of statistically elevated relative risk may be identified by the scan statistic, with supporting evidence from Bayesian posterior distributions to help identify the mapping units that contribute strongly to a scan statistic circle. Indeed, each spatial scan statistic circle reported in Figure 1 contains at least one elevated ZIP code identified in Figure 4 by the Bayesian model.

There is ample flexibility for exploratory analysis by varying the display parameters for results from these two methods. For example, the results in Figure 4 can be either more generalized or further focused by identifying ZIP codes where, say, 90% or 99%, respectively, of the posterior probability mass exceeds the null value of one. At the same time, we can display the set of non-overlapping scan statistic circles that correspond to lower levels of relative risk than are shown in Figure 1, thus capturing more geographic area. In fact, when this is done for scan statistic circles corresponding to 15–49% relative risk (not shown), all of the elevated ZIP codes in Figure 4 are spatially associated with significant scan statistic circles.

There is extensive literature on hierarchical Bayes spatial modelling for disease mapping; however, most papers are theoretical in nature and use illustrative examples, often with the same data sets. One exception was recently published by Short et al [31], who applied Bayes modelling to produce maps of cancer control variables. Specifically, they smoothed maps of different outcomes (mortality, incidence, staging and screening) for each of breast, colorectal and lung cancer in Minnesota (USA) counties. Cancer control maps were created for each cancer site by obtaining a weighted sum of each smoothed outcome, and an overall cancer control map was obtained by a weighted sum of the individual cancer control map values. These results can help guide resource allocation for state cancer prevention and control efforts.

While there are open areas for improvement in the methodology of hierarchical Bayes spatial modelling, it is a valuable tool for geo-spatial assessment of disease patterns that can help identify differences among communities. This may in turn indicate patterns of health care access, screening and diagnostic follow up and possibly indicate etiologic clues about causal relationships.