Simulated data were used to compare the power and sensitivity of the CPT and FMSPTs performed with GAMs to the spatial scan statistic under three simple alternative hypotheses. Theoretical power was computed for each alternative hypothesis to provide a comparison of spatial statistic hypothesis tests to simpler methods.

In Case 1, a circular cluster was centered in the study region. The spatial scan statistic identifies clusters by placing circular zones across the region of interest and comparing the likelihood of disease within to outside the zones. As this method is similar to the pattern of disease risk for this Case, it is unsurprising that the scan statistic had the highest estimated power, nearing the theoretical power calculated for a Pearson chi-square test. The CPT had slightly lower power than the scan statistic and FMSPT-3 and FMSPT-5 had lower power estimates.

In Case 2, there was a linear association between Euclidean distance from the center of the circular study region and the logodds of disease. Case 3 was a square study region with a linear association between the proximity to the center of the horizontal axis and the logodds of disease. As these cases would be appropriately analyzed by logistic regression methods, GAM permutation tests had an advantage over the scan statistic in its flexibility to detect different patterns in disease risk. In both Cases 2 and 3, the CPT had the highest estimated power though estimates were at least 10% smaller than the theoretical power of a logistic regression. The scan statistic had the lowest power for Cases 2 and 3. For all tests, power estimates for Case 1 exceeded those of Cases 2 and 3. For the GAM permutation tests, power estimates for Case 3 were greater than those of Case 2 under similar conditions while the estimates were comparable between the two cases for the scan statistic.

The size of the most likely cluster and hot- and coldspots identified by the scan statistic and GAM methods varied greatly across datasets, as observed in Figures 6 and 7. For Case 1, an odds ratio of 3.0, and a lower prevalence, i.e. a probability of disease outside the cluster of 0.05, the spatial scan statistic had larger variation in most likely cluster radius and a greater probability of a most likely cluster having a radius smaller than the true cluster than for higher prevalence. (Table 6, Figure 5a) The scan statistic showed a tendency to detect small clusters while the CPT tended to smooth over small variations in disease risk as a large span (span > 0.80) was more likely to be selected when analyzing diseases of lower prevalence. (Figure 5b)

Comparing model sensitivities, in Case 1, the FMSPT-5 consistently detected the highest proportion of the true cluster as a hot- or coldspot, followed by FMSPT-3 and CPT with the scan statistic having the lowest mean proportion detected. It is not surprising that FMSPT-3 and FMSPT-5 had the highest sensitivity estimates as the definition of sensitivity of these tests considered points detected if they were considered a hot- or coldspot in at least one of 3 or 5 models. Sensitivity for the CPT required the points to be detected at a single span size.

Of interest, the spatial scan statistic had the highest power estimates for Case 1 though it did not detect the highest proportion of the true cluster. As for its sensitivity, the scan statistic detected a most likely cluster of the correct size with a radius within ±0.01 of the true cluster radius in 19.4% of datasets with an odds ratio of 3.0 and a probability of disease outside the cluster of 0.20. Of these most likely clusters, 12.4% were centered in the correct location and only one dataset was observed to have a correct cluster radius and location with a p-value of less than 0.05.

In Case 2, sensitivity was measured by the probability of detecting the exposure source, given that the global null hypothesis was rejected. In practice, after detecting variation in disease risk, public health resources may be sent to specific locations detected as hot- or coldspots to determine the source of exposure. If the exposure point source is not included in the most likely cluster or hot-/coldspot detected, it is unlikely that public health officials will be able identify the true exposure that is increasing disease risk. A minimum sensitivity of 80% may be considered a reasonable requirement of tests used for application. The FMSPTs had sensitivity estimates of at least 80% for odds ratios over 2.0 while the CPT had sensitivities of 80% for odds ratios of at least 3.0 for both probabilities of disease. The sensitivity of the scan statistic did not reach 80% for any odds ratios, having much lower estimates than the permutation testing methods. Of the datasets where the scan statistic detected a most likely cluster with a p-value of less than 0.05, it rarely identified the correct exposure point source.

Sensitivity for Case 3 was measured as the proportion of the vertical exposure source identified as high or low risk, given that the global null hypothesis was rejected. Again, the spatial scan statistic had much lower sensitivity than the permutation testing methods. For odds ratios of at least 3.0 and a probability of disease for unexposed subjects of 0.20, the FMSPTs had sensitivity estimates of at least 70%, slightly lower than the desired magnitude. FMSPT-5 had the highest sensitivity, followed by FMSPT-3 and CPT.

For the CPT, we selected the span size through minimization of the AIC statistic. Many other methods of span selection are available. We believe similar results would be observed for any data driven span selection procedure, but further research is needed to confirm this. For the FMSPTs, we selected spans for a range across possible span sizes *a priori*. Other span sizes could be selected and power estimates may change accordingly. For the CPT and FMSPTs, we applied significance level adjustments based on empirical evidence from previous research and a nominal α level of 0.05 [27]. There is no guarantee that similar results will be observed in future studies as the significance cutoffs used here were selected and evaluated through a single set of simulations. For different nominal α levels, appropriate significance cutoffs must be determined. A number of extensions to the scan statistic are available, including elliptical [28] and flexibly shaped [29] zones; however for this research, our interest was in evaluating the original, and widely used, circular spatial scan statistic as applied using the software SaTScan. Applications of other versions of the scan statistic may influence the statistical power and sensitivity of the test. Evaluation of the extended methods is left for future research. In this research, we applied the methods to point data. Both the scan statistic and GAM methods are applicable to aggregate data and if applied to such data, the resulting distribution of power estimates would likely change.