Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers
 Monica C Jackson†^{1}Email author,
 Lan Huang†^{2},
 Jun Luo†^{3},
 Mark Hachey†^{3} and
 Eric Feuer†^{2}
DOI: 10.1186/1476072X855
© Jackson et al; licensee BioMed Central Ltd. 2009
Received: 5 June 2009
Accepted: 12 October 2009
Published: 12 October 2009
Abstract
Background
The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated.
Methods
We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*_{ pop }; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 19872004 HIV and a 19501969 lung cancer mortality data in the United States.
Results
For simulated data with outlier patterns, Tango's MEET, Moran's I and I*_{ pop }had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*_{ pop }(with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.70.8 and Moran's I has powers around 0.20.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data.
Conclusion
SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*_{ pop }perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.
Background
In medical research and epidemiological studies, it is important to understand the spatial heterogeneity (clustering) of disease cases across the study regions. If global or local clustering patterns among the responses (e.g. cancer cases) exist, it is essential to consider the spatial correlation of the individuals in a statistical model to evaluate the association between the response (e.g., cancer death and incidence) and the risk factors. Some statistical methods that are used in spatial analysis involve aggregated summaries (count data) and Poisson regression models that assume independence of neighboring locations without overdispersion [1]. These assumptions are violated if spatial dependence exists such as in Kulldorff et al. [2] which examined unusually high breast cancer rates in the northeastern part of the United States and research done by Pickle et al. 2001 to detect a local cluster of an extreme lung cancer mortality rate in a single county in Montana caused by arsenic exposure [3]. It is important to identify models with possible spatial correlation when performing statistical analysis. For example, a risk factor (such as the distance a region is from a contaminated water source when measuring incidence of cholera) may turn out to be significant in a model assuming independent responses. However, this distanceoriented risk factor may not be identified when spatial correlation in residuals is considered in the model.
Many statistical methods for testing spatial clustering and heterogeneity were developed and can be classified into two groups: methods for global clustering evaluation and for local cluster detection [4, 5]. Global clustering measures assess spatial trends (the tendency of spatial clustering) across an entire study region. Cluster detection methods identify specific local clusters. Both clustering patterns (global and local) occur in many cancer incidence and mortality data sets. Simulation studies have been conducted to evaluate methods for global clustering evaluation and local cluster detection [6–10], but none were designed to study and compare the performance of the methods on data with two extreme situations, such as global clustering and local outlier patterns. The purpose of this paper is to investigate the performance of the methods for testing spatial heterogeneity on data with the two extreme situations. We simulate disease data with homogenous populations to allow for a stronger power study [5]. We provide guidance for the proper use of the statistical methods selected in the two situations of global clustering and cluster detection.
Among methods for global clustering evaluation, we selected Tango's MEET [11], because it has been shown to be the most powerful test for testing spatial heterogeneity [5–7, 12]; Moran's I [13] is selected as a reference for comparison since it has been used widely in these areas; as an alternative version of Moran's I adjusting for heterogeneous population density, Oden's I*_{ pop }[14] is selected because it has not been compared with Tango's MEET in earlier works. Among cluster detection methods, SaTScan elliptic version [15] is selected because it has proven to be very powerful in detecting local clusters (with either regular shapes or irregular shapes) with good precision, and reasonable computation time [16, 17]. We also selected a local version of Moran's I among the cluster detection methods because of the ability of the statistic to perform well on outlier detection even though it is not a good method for detecting large clusters [18, 19].
The rest of the paper is organized as follows. First, we briefly present selected global indices of spatial autocorrelation, local indices of spatial association (LISA) and SaTScan elliptic version. Second, we discuss the steps and parameters utilized in the simulation study. Next, the results of the simulation study are included, and the application of the selected methods on HIV and lung cancer mortality data in the United States are given. The paper ends with a discussion.
Methods
2.1 Global indices of spatial autocorrelation
Global indices of spatial autocorrelation as defined by Waller and Gotway [19] provide a summary over the entire study area of the level of spatial similarity observed among neighboring observations. In this section we briefly describe the three common global indices of spatial autocorrelation we intend to evaluate. In the following sections, i and j denote geographic units (e.g. counties), y_{ i }is the number of cases at geographic unit i, n_{ i }is the population at risk at geographic unit i, , , N is the total number of geographic units, w_{ ij }is a weight assigned to the pair of geographic units i and j.
Tango's Index
For this paper we refer to Tango's EET with the adjacent neighbor weight function given in equation [2] as Tango_ADJ.
where d_{ ij }is the distance between geographic units i and j, and m_{ i }= max{r:u_{r(i)}≤ λ}. The population size in geographic units i and its r nearest neighbors is defined as u_{r(i)}The parameter λ is chosen by the user and allows the user to view the population density as a measure of spatial clustering. Song and Kulldorff [21] note that for a given λ, the weight function will decrease slower in rural areas than urban areas, as in a rural area k_{ i }is large. Thus, a large λ is more sensitive to larger clustering pattern and smaller λ is more sensitive to smaller clusters.
Since the weight function (such as the one in equation [3]) depends on a user defined parameter λ, Tango developed Tango's Maximized excess events test (MEET) in order to detect clustering patterns irrespective of the geographic scale. As discussed in [11], Tango's MEET is defined as . Here, we refer to Tango's MEET in our paper as Tango_PDM since we define EET using weight as in equation [3]. We set V to be 50% of the total population (an upper limit on λ). Where λ has values as 0.1%, 0.5%, 1%, 2%, 5%, 10%, 20%, 30%, 40%, and 50% in our study.
Moran's I
where , and .
We use the adjacent neighbors weight function as defined in equation [2]. I is between 1 and 1; Positive values of I are associated with strong geographic patterns of spatial clustering, negative values of I indicate negative spatial correlation (i.e. a clustering of dissimilar values), and a value close to zero represents complete spatial randomness.
Oden's I*_{ pop }
Where , v_{ i }= n_{ i }/n_{+}, v_{ j }= n_{ j }/n_{+}, e_{ i }= y_{ i }/y_{+}, e_{ j }= y_{ j }/y+, and . Oden notes that symmetry is not required for I*_{ pop }and w_{ ii }≠ 0 (but can be fixed at any specified value).
For this study we refer to I*_{ pop }as I*_{ pop }_ADJ with the adjacent neighbor weight function and I*_{ pop }_PD with the population density weight function. Note that I*_{ pop }_PD should be more sensitive to the global clustering patterns if a larger λ is assigned, and more sensitive to local clusters with a smaller λ value.
Cluster detection methods
While global indices of spatial association evaluate the tendency of global spatial clustering across an entire region, local indices of spatial association (LISA) detect patterns in geographic units that deviate extremely from neighboring units (local outliers). SaTScan is another tool that has been widely used for local cluster detection, which is good for detecting large clusters. It may also evaluate outliers (local clusters with a small geographic or population size) when the outlier pattern is very strong or a small maximum search window is used.
Local Moran's I
where is the overall rate.
SaTScan
which is independent of the Z. The Z that maximizes the λ over all the Z's in G is the most likely cluster. When searching for large clusters, we use 50% of total population as the maximum size of Z, and for local small cluster or outlier detection, we use 5% of the total population as the maximum size of Z in the simulation.
Simulating geographic data
We simulate count data with fixed total number of cases (or sample size) for i = 1,...,3109 representing data from the 3109 counties of the continental United States (US) (multinomial data.) All distance measures are calculated using the latitude and longitude coordinates of the county centroids. We use the actual configuration of U.S. counties but not the real U.S. populations for the counties, because the U.S. has a complex configuration at the county level with a varying number of neighbors for each county and it can provide a real and complicated structure of the adjacent neighbors and nearest neighbors. Waller et al. [5] discussed the issue with power analysis of the tests of clusters and clustering in heterogeneous populations. They found that power depends on the local population at risk. Here, in order to evaluate the performance of the methods on data with varying relative risk patterns without confounding from heterogeneous population, we simulate data assuming homogeneous county population in the US. We set n_{ i }(the population at risk) equal to 5,000 for all simulations, thus n_{+} = 15,545,000. Note that the performance of the tests will be poorer when data are generated from a heterogeneous population, because more spatial variation will be introduced through the population variation. We experimented with simulations that had 5,00050,000 fixed cases (results not shown) to obtain a relevant range of power. Larger and smaller number of cases yielded results which did not adequately force separation of the methods with respect to the power in different scenarios (i.e., for small number of cases all methods had a very poor power for data with outliers; and for a larger number of cases, all the methods had a power approximately 1 for data with a global clustering pattern.) Therefore, we used a smaller y_{+} (sample size of 5,000) to simulate data with global clustering pattern and a larger y_{+} (sample size of 30,000) to simulate data with a local cluster (outlier) as described in the following sections.
where (y_{1}, y_{2},...,y_{3109}) are regional counts generated from the multinomial distribution. The total number of cases y_{+} here is always the same as the total number of cases in the corresponding data simulated under the alternative hypothesis.
where , r_{ i }is the relative risk at geographic unit i, which is not the same for all geographic units.
Calculating power and chance of a county being detected as inside clusters

Step 1: Simulate 10,000 data sets using the multinomial distribution (equation 6) under the null hypothesis (no outlier or global clustering pattern).

Step 2: Calculate Tango_PDM, Tango_ADJ, Moran's I, I*_{ pop }_PD, I*_{ pop }_ADJ, and SaTScan elliptic version (SaTScanE) statistics for each data set from step 1. Calculate Local Moran's I for each county i in each data from step 1.

Step 3: Find the 95th percentile for Tango_ADJ, Moran's I, I*_{ pop }_PD, I*_{ pop }_ADJ, and SaTScanE. The 95^{th} percentile is the critical point for the empirical distribution of each of the statistics. Find the 5th percentile for Tango_PDM, which serves as the critical point for Tango's I with the PDM weight function. Find the 95th percentile for the local Moran's I at each county i. There are then 3109 critical points for Local Moran's I (one for each county.)

Step 4: Simulate 1000 alternative data sets using the multinomial distribution (equation 7) under each of the alternative hypotheses described in the previous section. The five settings mentioned above are:
a) Global pattern with exponentially increasing relative risk rates (equation 9) b)
Global pattern with linearly increasing relative risk rates (equation 8)
 c)
Local outlier with two adjacent neighbors (Manassas City VA)
 d)
Local outlier with six adjacent neighbors (Jackson County KA)
 e)
Local outlier with ten adjacent neighbors (Fulton County GA)
 b)

Step 5: Calculate Tango_PDM, Tango_ADJ, Moran's I, I*_{ pop }_PD, I*_{ pop }_ADJ, and SaTScanE for each of the 1000 data sets from Step 4. Calculate local Moran's I for each county in each data set from Step 4.

Step 6: Report the power depending on the statistics.
a. For each of the statistics Tango_ADJ, Moran's I, I*_{ pop }_PD, I*_{ pop }_ ADJ, and SaTScanE, We calculate the power as the percentage of values out of the 1000 replicates that are above the critical point obtained in step 3. For Tango_PDM, the power is the percentage of values smaller than the critical point obtained in step 3 b.
For the cluster detection methods SaTScanE and local Moran's I, we also report the chance of a county being detected inside clusters, which is defined to be the number of times that a county is counted inside a detected cluster or as an outlier out of the 1000 replicates. If that ratio for a county is close to 1, then there is a large chance that a particular county is inside a cluster or an outlier as determined by this method. If the ratios for the counties inside the true cluster or a true outlier are all high, we claim that the method has a high chance to find the correct cluster location, which indicates good precision of cluster detection.
 b.
Notes on simulation methods
Statistics and methods used to detect global clustering and outlier detection along with weight functions used.
Weight function  

Method/Statistic  Adjacent neighbor  Population density  
Global  Tango  ✓  ✓ 
I* _{ pop }  ✓  ✓  
Moran's I  ✓  
Local  Local Moran's I  ✓  
SatScanE  na  na 
Simulation results
Performance of the methods on data with global spatial clustering patterns
Power of selected statistics for detecting data with local and global cluster types.
Global clustering methods  Local cluster detection methods  

Tango_PDM  Tango_ADJ  Moran's I ADJ  I *_{pop _}PD ( λ = 5%)  I *_{pop _}PD ( λ = 50%)  I *_{pop _}ADJ  Local Moran's I  SaTScan^{a}  
Local clusters relative risk = 4 sample size = 30,000  Two adjacent neighbors (Manassas City VA)  0.107  0.127  0.024  .079  0.056  0.060  +  0.973 
Six adjacent neighbors (Jackson County KS)  0.089  0.124  0.027  .087  0.042  0.047  +  0.962  
Ten adjacent neighbors (Fulton County GA)  0.081  0.125  0.030  .079  0.043  0.055  +  0.979  
Global clustering relative risk = 1.5 sample size = 5.000  Linear  0.997  0.285  0.232  .884  0.996  0.119  +  0.737 
Exp  0.998  0.291  0.235  .886  0.997  0.129  +  0.781 
Designed for cluster detection, SaTScanE with maximum spatial window as 50% of total population has moderate power (around 0.75 (See Table 2)) in identifying the data with spatial heterogeneity successfully.
The type of increasing function in the global trend (exponential or linear) was not a key factor in power for all the statistics for the data with the same maximum relative risk. Power is slightly higher for data with exponential function compared with linear function because the relative risk in data with exponential function increases faster than that in data with linear function, even though the maximum relative risk in both types are the same (maximum relative risk = 1.5). We also note that with maximum relative risk above two, all methods obtained a statistical power above .95.
Performance of methods for data with local clusters (outliers)
Based on our simulation study, the ability of the statistics to identify the spatial heterogeneity in data with only outliers is not very sensitive to the number of adjacent neighbors. SatScanE with maximum search window as 5% of total population performed significantly better (with power above 0.95) than all methods for global clustering evaluation (with power lower than 0.2) in terms of power (Table 2). This result was not surprising since SaTScanE is designed for local cluster detection and with a small search window it searches for clusters with a smaller size (outliers). Methods for global clustering evaluation do not perform well because there is no strong global clustering tendency when the data have a single outlier including one county.
We also compared the chance of the true outlier counties being detected as inside clusters from the two methods for local cluster detection (SaTScanE vs. LISA) in Figure (3B vs. 3C). SaTScanE has a very good chances (above 0.9) to detect the true outlier counties. The maximum chance of being detected as inside clusters for local Moran's I are lower than those from SaTScanE (around 0.40.5). However, we note that local Moran's I did detect the location of the true simulated outlier with a higher chance (0.40.5) than all other counties (with chance < 0.1) as shown in Figure 3C. The precision for SaTScanE is very good because only the true outlier counties have a very high chance (red color) to be detected from the map, and the precision of local Moran's I is not very good because the chance of a true outlier being detected is about 0.40.5 and the counties around the true outliers also have moderate chance (0.40.5) to be detected as inside cluster.
Examples
We provide two real data examples with the two extreme clustering situations (global clustering pattern and outlier) to demonstrate the performance of SaTScanE and Tango_PDM, which have the best performance among the others methods evaluated in our simulation study.
Clustering pattern of HIV mortality in US during recent decades
We then used the selected methods on the raw HIV mortality data for testing of the spatial heterogeneity. Tango_PDM successfully detected the global clustering pattern (pvalue < .001). This result from Tango_PDM indicates that it is necessary to consider global spatial correlation of mortality rates when constructing classical spatial statistical model to analysis mortality data. A spatial model with a variancecovariance matrix incorporating spatial patterns will help to eliminate a possible correlation in the residuals compared with a regression model without considering spatial correlation.
As shown in Figure 4C, SaTScanE identified five statistically significant clusters (pvalue < .05) with relative risks ranging from (1.11 to 10.37) when we only searched for clusters of high rates using 50% of total population at risk as the maximum spatial search window. However, it does not capture the global clustering pattern observed in Figure 4A and Figure 4B. We also conduct a search for clusters with low rates only. As shown in Figure 4D, we have a large cluster (No. 1) covering the upper part of US with low relative risk (0.31) compared with the regions outside. The other significant clusters (No. 2, 3, and 4) have relative risks as 0.52, 0.53 and 0.67 separately. This does indicate a global clustering pattern in HIV mortality in US, which is consistent with the pattern observed in Figure 4A and Figure 4B.
Clustering pattern of lung cancer mortality in US during 50s60s
Lung cancer has been the leading cancer in the United States and experienced a decreasing mortality trend over years. The mortality is very high for lung cancer patients in earlier years (1950s60s). Here, our interest is to identify possible clusters/outliers with high lung cancer mortality or global clustering pattern using the two selected methods. We obtained the lung cancer mortality data during the years 19501969 for white males from the National Cancer Institute at http://www3.cancer.gov/atlasplus/. The total number of deaths observed in these data are 570,521 (a large sample size). When Tango_PDM was used, we obtained a pvalue <.001, which indicates the existence of a global clustering pattern in this male lung cancer data.
Conclusion and discussion
In this article, we explored the performance of several global indices of spatial autocorrelation, local Moran's I and SaTScanE to detect global clustering and identify outliers. Tango_PDM had the highest statistical power for identifying global clustering patterns among all methods considered. The power for I*_{ pop }with proper λ are very close to that for Tango_PDM, but the user has to choose the value of λ, which is subjective and it is very hard to find the optimal λ. However, Tango_PDM evaluates regions with varying total population and provides a single statistic and pvalue, so it is easier for users without much knowledge of the geographic features of the study region to identify global spatial clustering. SaTScan with a large search window (50% of total population) may have moderate power to detect spatial heterogeneity but it may not reveal the correct global clustering pattern.
SaTScanE performed well in detecting the outliers in terms of power, which is much better than local Moran's I and the methods for global clustering evaluation. From our simulation study we also found that for a large relative risk difference (greater than 2), SaTScanE as well as all the other methods considered were able to detect spatial heterogeneity with a power above .98. For local cluster detections (outlier), the power is sensitive to the change in sample size with increased sensitivity. When the sample size was less than 25,000 we had very low power in detecting outliers. Even though SaTScan and local Moran's I did obtain a higher chance to locate the true location of outliers compared with other locations, the overall chance of detecting the true outlier is low.
The weight function does affect the power of the methods for evaluating global clustering pattern. For the methods using adjacent neighbor weight function, the order of the power is Tango's Index, I*_{ pop }, and Moran's I, which implies that Tango's index is the best in detecting spatial heterogeneity among the selected methods for global clustering evaluation even with the same weight function. When comparing the power for I*_{ pop }with the weight functions PD (with λ = 50% of total population) and ADJ, we notice that the power for the data with global clustering is much higher with the PD weight function compared to the ADJ weight function.
A limitation of our simulation is that we simulated homogeneous county populations across an entire region. This assumption likely does not hold in real data. Heterogeneous population densities complicate comparisons of statistical power between hypothesis test evaluating spatial clusters or global clustering [5]. Our intent was to investigate statistical power while controlling for as many variations and confounding factors as possible, in order to effectively evaluate and compare each statistics. However, we do provide an example based on real mortality data with heterogeneous population across the U.S.
Our example based on heterogeneous population revealed that Tango_PDM method detected the global clustering pattern which validates the simulation results based on a similar global clustering pattern. Although SaTScan was designed for cluster detection, it can still be used to access global clustering patterns if larger clusters exist. However, SaTScan can be misleading in detecting clusters as shown in Figure 4c. Therefore one should be careful when applying SaTScan to evaluate global clustering patterns. SaTScan did find the outlier (Silver County) in Montana with real heterogeneous population and unusually high lung cancer mortality rate among white males using a 5% and 50% window. Usually, it is difficult to detect outliers using a 50% window. In our case a 50% window was sufficient to detect an outlier. This result is partially due to the fact that we have a large relative risk of 1.8 and a large sample size which made outlier detection more feasible.
There are many directions for future research. First there are several alternative autocorrelation and spatial association patterns (not only global clustering and outlier patterns) that could be considered. For example, Song and Kulldorff [7] simulate various cluster sizes for their power study. In addition, there is much work to be done in defining appropriate spatial weight functions. We considered common spatial weights: adjacent neighbors and population density. However alternative weight functions (such as the weight functions included in Song and Kulldorf [21]) may influence the performance of each test. Other outlier detection methods (e.g. Multiitem Gamma Poisson Shrinker [24] (MGPS)) may be evaluated and compared with SaTScanE and Local Moran's I in the future. Comparison with heterogeneous population data is also possible if one is interested in the performance of the tests in the presence of real heterogeneous population from the US.
Notes
Declarations
Acknowledgements
The authors would like to thank Dr. Dave Stinchcomb and Dr. Ruth Pfeiffer of the National Cancer Institute for their many useful comments which greatly improved this paper. Mr. Yongwu Shao of Information Management Science for his programming help. We are grateful to Dr. Linda Pickle of Statnet, LLC for providing statistical consultation on locating data sets. Financial support was provided to Dr. Monica Jackson by the Statistical Research and Applications Branch of the National Cancer Institute as part of contract number 263MQ706620 and the Intergovernmental Personnel Act (IPA).
Authors’ Affiliations
References
 McCullagh P, Nelder JA: Generalized Linear Models. 1989, London: Chapman and HallView Article
 Kulldorff M, Feuer EJ, Miller BA, Freedman LS: Breast cancer clusters in the northeast United States: a geographic analysis. Am J Epidemiol. 1997, 146 (2): 16170.PubMedView Article
 Pickle LW, Heineman EF, Ward MH, Nuckols JR, Gumpertz ML, Bell BS, editors: Applications of GIS to cancer research at the National Cancer Institute. ESRI International Health geographics conference; 2001; Washington DC. 2001
 Wheeler D: A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 19962003. International Journal of Health Geographics. 2007, 6 (13):
 Waller LA, Hill EG, Rudd RA: The geography of power: statistical performance of tests of clusters and clustering in heterogeneous populations. Stat Med. 2006, 25 (5): 85365. 10.1002/sim.2418.PubMedView Article
 Song C, Kulldorff M: Power evaluation of disease clustering tests. Int J Health Geogr. 2003, 2 (1): 910.1186/1476072X29.PubMedPubMed CentralView Article
 Kulldorff M, Tango T, Park PJ: Power comparisons for disease clustering tests. Computational Statistics & Data Analysis. 2003, 42 (4): 66584. 10.1016/S01679473(02)001603.View Article
 Tango T: Comparison of general tests for spatial clustering. Disease Mapping and Risk Assessment of Public Health. Edited by: Lawson. 1999, London: Wiley
 Hanson C, Wieczorek W: Alcohol mortality: a comparison of spatial clustering methods. Social Science and Medicine. 2002, 55 (5): 791802. 10.1016/S02779536(01)002039.PubMedView Article
 Fotheringham AS, Zhan FB: A comparison of three exploratory methods for cluster detection in spatial point patterns. Geographical analysis. 1996, 28 (3): 20018.View Article
 Tango T: A test for spatial disease clustering adjusted for multiple testing. Stat Med. 2000, 19 (2): 191204. 10.1002/(SICI)10970258(20000130)19:2<191::AIDSIM281>3.0.CO;2Q.PubMedView Article
 Kulldorff M, Song C, Gregorio D, Samociuk H, DeChello L: Cancer map patterns: are they random or not?. Am J Prev Med. 2006, 30 (2 Suppl): S3749. 10.1016/j.amepre.2005.09.009.PubMedPubMed CentralView Article
 Moran PAP: Notes on continuous stochastic phenomena Biometrika. 1950, 37: 1723.
 Oden N: Adjusting Moran's I for population density. Stat Med. 1995, 14 (1): 1726. 10.1002/sim.4780140104.PubMedView Article
 Kulldorff M: A spatial scan statistic. Communications in Statistics. Theory and Methods. 1997, 25 (22): 392943.
 Huang L, Pickle L, Das B: Evaluating spatial methods for investigating global and cluster detection for cancer cases. Statistics in Medicine. 2008, 27 (25): 511142. 10.1002/sim.3342.PubMedPubMed CentralView Article
 Duczmal L, Kulldorff M, Lan H: Evaluation of Spatial Scan Statistics for Irregularly Shaped Clusters. Journal of Computational & Graphical Statistics. 2006, 15 (2): 1510.1198/106186006X112396.View Article
 Kulldorff M: Tests of spatial randomness adjusted for an inhomogeneity: a general framework. Journal of the American Statistical Association. 2006, 101 (475): 1289305. 10.1198/016214506000000618.View Article
 Waller LA, Gotway CA: Applied Statistics for Public Health Data. 2004, New York: WileyView Article
 Tango T: A class of tests for detecting 'general' and 'focused' clustering of rare diseases. Stat Med. 1995, 14 (2122): 232334. 10.1002/sim.4780142105.PubMedView Article
 Song C, Kulldorff M: Tango's maximized excess events test with different weights. Int J Health Geogr. 2005, 4: 3210.1186/1476072X432.PubMedPubMed CentralView Article
 Griffith DA: Some guidelines for specifying the geographic weight matrix contained in spatial statistical methods. Practical Handbook of Spatial Statistics. Edited by: Arlinghaus SL. 1996, Boca Raton: CRC Press
 Simonson H: Headbang software. Statistical Research Applications Branch, NCI: Version 3. 2003
 Szarfman A, Machado SG, O'Neill RT: Use of screening algorithms and computer systems to efficiently signal higherthanexpected combinations of drugs and events in the US FDA's spontaneous reports database. Drug Saf. 2002, 25 (6): 38192. 10.2165/0000201820022506000001.PubMedView Article
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.