### Introduction

Geographic cluster detection and evaluation are important in disease surveillance. One frequently used method for cluster detection is the spatial scan statistic [1–3] and the related space-time scan statistic [4]. This method has been used to study the geography of infectious diseases such as malaria [5], vector borne diseases such as West Nile Virus [6], many different forms of cancer [7–11], low birth weight [12], syndromic surveillance [13–17], and bovine spongiform encephalopathy [18], among many other diseases.

The spatial scan statistic is found by moving a scanning window across the geographical region of interest, generating a large collection of window locations and sizes that meet pre-defined criteria. A likelihood ratio is calculated for the data corresponding to each window location and size and the spatial scan statistic is the maximum of these likelihood ratios. The window location and size with the maximum likelihood ratio is the most likely cluster; that is, the cluster that is least likely to have occurred by chance [1, 2]. Except for the simplest scenarios, there is no known closed-form theoretical distribution for the spatial scan statistic. Therefore, p-values for scan statistics are usually obtained using Monte Carlo hypothesis testing [19].

In Monte Carlo hypothesis testing, a large number of random replicates of the observed data are generated under the null hypothesis. Monte Carlo p-values are asymptotically equivalent to p-values from exact permutation tests as the number of random replicates increases, but the key property of Monte Carlo hypothesis testing p-values is that they maintain the correct alpha level, exactly, as long as the number of replicates plus one is a multiple of 1/α [19–21]. Monte Carlo hypothesis testing can therefore be useful when theoretical distributions are unknown and the number of permutations prohibits a full enumeration. One major drawback to the approach is that small p-values can only be obtained through a very large number of Monte Carlo replicates, which may be computer intensive and time consuming. For the spatial and space-time scan statistics, Monte Carlo hypothesis testing requires the calculation of the likelihood ratio for each location and size of the scanning window, for each replicated data set. Thus, the approach can be computer intensive for very large data sets.

In disease surveillance, the space-time scan statistic is sometimes calculated on a daily basis, to continuously monitor a disease in near real-time [13, 22]. These clusters may then be reported to local, state, or federal public health officials for potential investigation. Using a conventional 0.05 α-level would on average result in one false rejection of the null hypothesis every 20 days. Because of limited resources, health officials are not able to investigate a lot of false alarms [13, 22]. To control the number of false rejections at a more tenable level, one might instead use an α-level of 1/365 = 0.00274 or 1/3650 = 0.000274, corresponding to one expected false positive every year or every ten years, respectively, for daily analyses. So, instead of 999 replicates for an alpha level of 0.05, we may want to use 99,999 replicates or more for an alpha level of 0.000274, keeping approximately the same ratio. If multiple diseases are under surveillance, this may require a high computational burden with millions of random replicates to be simulated each day when Monte Carlo hypothesis testing is used.

In this article, we propose a way to do hypothesis testing for very small alpha levels with fewer calculations. The approach we take is to find a distribution which closely approximates the distribution of the test statistics that were generated under the null hypothesis, which themselves reflect the distribution of the scan statistic under the null. To do this we generate a relatively small number of random simulated replicates under the null hypothesis. We then use them to estimate parameters for a distribution with a well-characterized functional form. If this distribution fits the sample distribution well, we can use it as an estimate of the distribution of the spatial or space-time scan statistic under the null and use it to generate arbitrarily small p-values. Because we are interested in small p-values, it is particularly important that the estimate is good in the tail of the distribution.

We note that although this paper is focused on the spatial and space-time scan statistics, the general methodology that we propose in this article can easily be applied to other test statistics that rely on Monte Carlo hypothesis testing.

### Scan statistics

The spatial scan statistic is used to identify potentially unusual clustering of events on a map. Events may, for example, be cases of disease incidence, prevalence or mortality. Suppose that there are *p* geographical coordinate pairs marked on a map, each representing a region. The analysis is conditioned on the total number of events, and under the null hypothesis, each event is independently and randomly located in a region with probability proportional to the population in the region, or to some covariate-adjusted population based denominator. How best to adjust for covariates is a critical issue which we do not consider in this paper.

We look at all unique subsets of events that lie within a collection of scanning windows to detect clusters. Although any shape scanning window may be used, we use circles throughout this paper. Consider all circles, *C*_{
i,r
}, where *i* = 1,..., *p* indicates the coordinates around which a circle may be centered, and *r* indicates its radius, which ranges from 0 to some pre-specified maximum. Based on the observed and expected number of events inside and outside the circle, calculate the likelihood ratio for each distinct circle [1, 2]. The circle with the maximum likelihood ratio is the most likely cluster, that is, the cluster that is least likely to have occurred by chance. For computational simplicity, the logarithm of the likelihood ratio is typically used instead of the ratio itself, and the log-likelihood ratio associated with this circle is defined as the *scan statistic*. Likelihoods can be calculated under different probability models, such as binomial or Poisson.

The space-time scan statistic is analogously used to identify clusters in regions of space and time. Envisioning each discrete moment of time as a separate map, and the set of times as a stack of maps, the circles mentioned above can extend through the maps, making cylinders that are the potential clusters. The cylinder with the maximum likelihood ratio is the most likely cluster, and its log-likelihood ratio is the space-time scan statistic. In space-time models, we consider Poisson as well as space-time permutation-based probability models [4, 23].

For this study, we used the SaTScan™ [24] statistical software program, which calculates the scan statistic and implements Monte Carlo hypothesis testing to calculate a p-value. SaTScan™ allows the user to vary many parameters including the maximum cluster size, the probability model, and the number of Monte Carlo replicates.

### Monte Carlo hypothesis testing

When the underlying distribution for the test statistic is unknown it is not possible to calculate a standard analytical p-value. When it is still possible, however, to generate data under the null hypothesis, then Monte Carlo hypothesis testing can be used to calculate Monte Carlo based p-values, as proposed by Dwass [19]. To do this, one first calculates the test statistic from the real data. Then, a large number of random data sets are generated according to the null hypothesis, and the test statistic is calculated for each of these data sets. If one creates *R* random replicates of the data and *r*-1 of those replicates have a test statistic which is greater than or equal to the test statistic from the real data, so that *r* is the rank of the test statistics among the real data, then the Monte Carlo based p-value of the observed test statistic is *r/(1+R)*. If the test statistic from the real data set is among the highest 5 percent from the random data sets, then we can reject the null hypothesis at the α = 0.05 level of statistical significance.

As pointed out by several statisticians [19–21, 25], a nice feature of Monte Carlo hypothesis testing is that the correct α-level can be maintained exactly. This is simply done by choosing *R* so that *α(1+R)* is an integer. For example, if *α* = 0.05, then the probability to reject the null hypothesis is exactly 0.05 when R = 19, 99, 999, or 9999 random replicates. Following Bernard [19], suppose R = 19. Under the null hypothesis, the one real and 19 random data sets are generated in exactly the same way, so they are all generated from the same probability distribution. This, in turn, means that the ordering of the 20 test statistics is completely random, so that any single one is equally likely to be the highest, 2^{nd} highest, 3^{rd} highest, and so on, as well as equally likely to be the lowest. Hence, under the null hypothesis, the probability that the test statistic from the real data set has the highest value is 1/20 = 0.05, exactly. If it does have the highest test statistic, the Monte Carlo based p-value is p = r/(1 + R) = 1/(1 + 19) = 1/20 = 0.05, and since p ≤ *α* = 0.05, the null hypothesis is rejected.

Since the correct alpha level is maintained exactly whether R is small or large, one may think that the choice does not matter, but that is not the case, as fewer replicates means lower statistical power [25–27]. Hence, more replicates are always better. For *α* = 0.05, 999 replicates gives very good power, but for smaller alpha levels, an increasingly higher number is needed [21, 25].

One drawback with Monte Carlo hypothesis testing is that the p-value can never be smaller than 1/(1 + R). For example, with R = 999, the p-value is never less than 0.001. In most applications, with a 0.01 or 0.05 α-level, that is not a problem, as it is not necessary to differentiate between p-values of, say, 0.001 and 0.00001. A relatively small number of replicates will be sufficient. However, in the context of daily analyses in real-time disease surveillance, a cluster with p ≤ 0.05 will by chance happen once every 20 days, on average. That is too often, and the goal is to detect clusters of disease that are very unusual, and only the most unusual clusters will be investigated further. P-values on the order of 0.0001 or even smaller may be required before an investigation is launched. These p-values require at least 9999 Monte Carlo replicates and even more are needed to ensure good statistical power [21, 25]. The number of Monte Carlo replicates required is determined by the desired precision of the p-value, and each additional decimal place requires 10 times the number of Monte Carlo replicates and hence about 10 times the computing time.

### The Gumbel distribution

The spatial scan statistic described above is the maximum value taken over many circle locations and sizes, so the collection of these statistics generated from the Monte Carlo replicates is a distribution of maximum values. Since our technique involves finding a distribution which closely matches the distribution of the replicated statistics, it is natural to consider one of the extreme value distributions as one possible candidate to approximate the desired distribution. The Gumbel distribution is a distribution of extreme values, either maxima or minima. Here, we limit ourselves to distributions of maxima since the scan statistic is a maximum. The cdf for the Gumbel distribution of maxima is G(x)={e}^{-{e}^{-\frac{x-\mu}{\beta}}} and the pdf is g(x)=\frac{1}{\beta}{e}^{-\frac{x-\mu}{\beta}}{e}^{-{e}^{-\frac{x-\mu}{\beta}}}. Both *μ* and *β* can be estimated from a sample of observations using method of moments estimators as follows:

\stackrel{\wedge}{\beta}=\frac{s\sqrt{6}}{\pi}

\stackrel{\wedge}{\mu}=\overline{X}-0.5772\stackrel{\wedge}{\beta}

where \overline{X} is the sample mean and *s* is the sample standard deviation [28, 29].