Different approaches for the detection of space-time clusters have been proposed and implemented. The most referred approach relies on the space-time scan statistic [27], which identifies the most significant cluster of a particular shape in space and time. This method identifies the zone showing the strongest evidence of representing a high density cluster. The scan statistic is based on a maximum likelihood ratio for each potential cluster that expresses how much more likely the observed density is, under the hypothesis of clustering, than under the hypothesis of uniformity. Since the exact distribution of the test statistic cannot be determined, Monte Carlo simulation is used to perform the hypothesis test.

One important aspect of this cluster detection method is the choice of cluster shapes and cluster dimensions. This choice will obviously directly influence the final results. Kulldorff [27] referred that the best choice of window depends on the application and indicated some possibilities: all circular, with all circles centred at any of several foci on a fixed grid, with a possible upper limit on circle size or with a fixed circle size; all rectangles of a fixed size and shape; and, when looking for space-time clusters, the possibility of using cylinders, scanning circular geographical areas over variable time intervals.

Iyengar [28] analysed the influence of cluster shape and concluded that cylindrical or elliptical search windows can limit the fit of models, proposing analyses with more than one shape, computing, for instance, square pyramidal shapes.

The first problem addressed here is not cluster shapes (circular and elliptical shapes are considered), but cluster dimensions. Using only cylindrical shapes, a large number of windows (with different radii), can be computed. If we consider the hypothesis of elliptical windows, the number of possible windows to test grows fast (each window is defined by centre coordinates, angle (azimuth), major axis dimension and minor axis dimension). To deal with this infinity of possibilities, all software must have some parameters defined by default. For instance, when elliptical spatial scan statistics are requested, SaTScan, (one of the most popular and free space-time clustering software, [29]) uses the circular window plus four different elliptical shapes (by default).

The idea of this paper is that disease-specific semivariogram parameters can be used to infer these windows parameters. In simple terms, a semivariogram can be defined as a key function in geostatistics that describes spatial and/or temporal patterns of the observed phenomenon. It has a long and exhaustive history in scientific geostatistical studies [30–32].

The semivariogram represents an average behaviour continuity (mean pattern) for the whole study area, whereas a cluster is determined by the behaviour in a specific place (local pattern). But, although there is no evidence about any relation of global mean patterns with local behaviours, the use of mean pattern information is certainly more sensible than using some default case-study independent parameters. For instance, the software SaTScan [29] is parameterized by default (in advanced options including elliptical windows) to scan circular windows plus four different elliptical shapes, with ratios of the longest to the shortest axis of the ellipse of 1.5, 2, 3, 4 and 5. For each shape, different numbers of angles of the ellipse are tested: 4, 6, 9, 12 and 15, respectively. The north-south axis is always one of the angles included, and the remaining angles are equally spaced around the circle. For each shape and angle, all possible sizes of the ellipses are used, up to an upper limit specified by the user in the same way as for the circular window. But the question here is: Why these defaults? They don't have any parameterization related to the specific case-study.

Semivariograms describing the mean space-time patterns can be useful in this context, even when, like in this case-study, it is only possible to compute mean spatial semivariograms [33, 34]

Assuming a value *Z*(*x*_{
i
}, *t*_{
j
}) of variable *Z*, observed in a certain sub-district i (represented through a geometric central point *x*_{
i
}) for time *t*_{
j
}, this value can be correlated with the incidences observed in previous time periods for the same area, and with incidents observed at neighbouring sub-districts during the same or previous time periods.

The spatial continuity for a given period of time can be characterized using a mean spatial semivariogram,

*γ*_{
s
}(

*h*), computed by averaging the spatial semivariogram of each time

*t* slice and representing the mean spatial pattern for that given period of time [

34]:

where *Nt* is the number of time periods and *Nh* the number of pairs of sub-districts at distance *h* from each other (geometric central points distances).

These are commonly represented as a graph that shows the variance behaviour *γ*(*h*) against the (distance or time) lag *h*. Usually and in presence of spatial/temporal dependencies, the semivariogram initially rises from some point on the *y* axis (nugget effect) and reaches a threshold (sill) at a certain location (defining the range).

Ranges of the adjusted semivariogram models as well as angles (azimuths), can provide a first approach to case-study specific scan window parameters. Here, the use of the semivariogram parameters is proposed to infer a possible window shape, following the assumption (or expectation) that local behaviours can influence or be represented by the global behaviour's parameters, (not following their exact values, but their general shape – angle and ratio of the longest to the shortest axis of the ellipse). So, in order to incorporate case-study coherence into the parameters definition, these parameterized elliptic windows were used next to circular windows. This way, the number of different scan windows tested can be reduced and will follow a case-study specific parameterization.

The second problem addressed is related with the validation of identified clusters. Goovaerts and Jacquez [25] and Goovaerts [26] have presented some developments in neutral models, stressing that spatially uncorrelated models can lead to some predisposition to reject the null hypothesis, defining false clusters. Here a different approach is proposed: After cluster identification, (using hypotheses based on spatially uncorrelated models – Monte Carlo) and knowing about the existence of a specific spatial/temporal pattern, geostatistical simulations – Sequential Gaussian Simulations (SGS, [25, 33]) – were computed. SGS is used to generate realizations for each identified cluster, not considering the incidence rates observed in this cluster and imposing the fitted semivariogram model. "True" clusters (identified using spatially uncorrelated models) must present extreme (high) observed rates in their simulated local distributions (conditioned to the semivariogram and to neighbouring incidence rates).

For each cluster, the validation process can be summarized in the following steps:

- Temporally, delete observed incidence values of all spatiotemporal points within the cluster (sub-districts/years);

- Simulate *k* scenarios for all these points, using SGS;

- For each scenario, sum up simulated spatiotemporal values for the same spatiotemporal observed location (within this cluster);

- Compute a local distribution with *k* global incidence values;

- Compare the global observed incidence value (sum of observed incidence values within this cluster) with the local distribution: compute the probability of the simulated notified rates being above the observed notified rates of this cluster.

This validation process (considering only the spatiotemporal continuity after cluster identification) is a different and simpler approach to deal with the potential tendency to reject the null hypothesis defining false clusters. Note that this processes requires the reproduction of the mean spatial semivariogram inferred from all data, but only the histogram of those data considered in the simulation process (without data belonging to cluster). For each cluster, each simulated scenario is computed in a space-time domain (considering time as a third spatial axis).