Investigation of clusters of giardiasis using GIS and a spatial scan statistic

Background Giardia lamblia is the most frequently identified human intestinal parasite in Canada with prevalence estimates of 4–10%. However, infection rates vary by geographical area and localized 'pockets' of high or low infection rates are thought to exist. Water-borne transmission is one of the major routes of infection. Sources of contamination of drinking water include humans, domestic and wild animals. A previous study in southern Ontario, Canada, indicated a bivariate association between giardiasis rates and livestock density and/or manure use on agricultural land; however these variables were not significant when the variable 'rural' was added to the model. In that study, urban areas were defined as those with a minimum of 1,000 persons and a population density of at least 400 persons per Km2; all other areas were considered rural. This paper investigates the presence of local giardiasis clusters and considers the extent to which livestock density and manure application on agricultural land might explain the 'rural' effect. A spatial scan statistic was used to identify spatial clusters and geographical correlation analysis was used to explore associations of giardiasis rates with manure application on agricultural land and livestock density. Results Significant (P < 0.05) high rate spatial clusters were identified in a number of areas. Results also showed significant (P < 0.05) associations between giardiasis rates and both livestock density and manure application on agricultural land. However, the associations were observed in only two regions. Conclusions There is evidence that giardiasis clusters in space in southern Ontario. However, there is no strong evidence to suggest that either livestock density or manure application on agricultural land plays an important role in the epidemiology of giardiasis in the study area. Therefore these factors do not seem to explain the higher rates of giardiasis reported in rural areas. The spatial scan statistics methodology used in this study has an important potential use in disease surveillance for confirming or refuting cluster alarms.


Introduction
Infection with Giardia occurs throughout the world, in humans and over 40 other species of animals, with human prevalence ranging from 1 to over 90% depending on location and age of persons sampled [1][2][3]. In Canada, Giardia lamblia is the most frequently identified human intestinal parasite with prevalence estimates of 4-10% [4,5]. However, even within a country, infection rates vary widely and localized clusters of high or low infection rates exist [1]. In Ontario, Canada, the prevalence of Giardia infection is reported to have seasonal patterns suggesting seasonally active risk factors [6,7].
The main mode of infection is ingestion of Giardia cysts. This may occur by ingestion of contaminated water, food or by hand-to-mouth transfer of cysts, but water-borne transmission is thought to be one of the major routes of infection. There are a number of sources of water contamination, including domestic and wild animals as well as human sources [8]. In particular, infected cattle have been suspected to be an important source of human giardiasis [9,10]. Concentrations of Giardia cysts in water have been found to be significantly associated with the prevalence of the Giardia in animals [11]. Moreover, significantly higher concentrations of Giardia cysts have been reported in watersheds with cattle activity than in those with no cattle access [12].
A previous study carried out in Ontario using several multi-variable spatial regression techniques found statistically significant bivariate associations between giardiasis rates and both livestock density and manure use on agricultural land [13]. However, these associations were not significant when the variable 'rural' was added to the model. In that study, urban areas were defined as those with a minimum of 1,000 persons and a population density of at least 400 persons per Km 2 ; all other areas were considered rural [14]. The findings from the above study seem to suggest that different modes of transmission of giardiasis may be important in different local geographical regions. Therefore, this paper investigates the presence of clusters of giardiasis in southern Ontario and explores the extent to which livestock density and manure application on agricultural land might explain the 'rural' effect observed in the above study. If livestock plays a major role in the transmission of giardiasis to humans, then we would expect clusters of high giardiasis rates in areas with high livestock densities. This would have important implications for public health professionals since control strategies would vary depending on the most important risk factors in a particular location.
In this study we use geographical information systems (GIS) and a spatial scan statistic to investigate geographical clusters of human giardiasis reported to a surveillance system in southern Ontario. The spatial scan statistic implemented in SaTScan software [15] offers several advantages: it corrects for multiple comparisons, adjusts for the heterogeneous population densities among the different areas in the study, detects and identifies the location of the clusters without prior specification of their suspected location or size thereby overcoming pre-selection bias, and the method allows for adjustment for covariates [16,17]. Geographical correlations of giardiasis rates with livestock density and manure application on agricultural land are also explored to assess the extent to which these factors might explain the higher rates of giardiasis reported in rural areas.

Study area and data collection
The study included the area of Ontario to the south of latitude 46.25029 (approximately south of the French River). This area is referred to, throughout this article, as southern Ontario. Data on 22,496 cases of human giardiasis reported in southern Ontario from January 1,1990 to December 31, 1998 were extracted from the Reportable Disease Information System (RDIS) surveillance database. The extracted data contained date of birth, gender, postal code (PC) of residence, date of disease onset, laboratory date, treatment date, date of death and possible sources of infection.
A Postal Code Conversion File (PCCF) containing all valid postal codes (PC) as well as the names of each of the Census Sub-divisions (CSD) (i.e. municipalities) where the PCs were located was obtained from Statistics Canada [14]. This file contained the latitudes and longitudes of the centroids of each of the PCs. The spatial location of each patient was identified as the centroid of their residential PC; this was later aggregated to the CSD level. The use of PC centroids to identify the spatial location of the patients was appropriate for the purposes of this study because all spatial analyses were performed at higher geographical levels than the PC and therefore no errors were introduced by assigning patients to their residential PCs. However, we note that if individuals seek health care in Ontario, their place of residence (which may not necessarily be the place of likely exposure) is recorded in RDIS.
The 1996 Canadian population census [18] provided the denominators for calculation of spatial empirical Bayesian smoothed giardiasis rates. Land-use data were extracted from the 1996 census of agriculture [19].

Data manipulations
To add geographical co-ordinates to the giardiasis data, the latter were merged to the PCCF using the PC identifier. Internal consistency of the disease data was evaluated by checking fields for implausible values, screening for duplicate dates (birth dates, date of episode, laboratory dates, treatment dates and date of death) and checking for chronological plausibility of events. Duplicate records were identified and removed. Cases for which the risk settings (or probable sources of infections) were hospital (43 cases), local camping (607 cases), local vacation (49 cases) or travel (4,766) were excluded from the analyses.
Cattle density was calculated as: (1) numbers of cattle per hectare of agricultural land and (2) numbers of cattle per hectare of pasture land. The two measures of cattle density were selected so as to explore possible differences in their propensity to contaminate drinking water. Similar computations were performed for livestock density. Livestock was defined as the total number of cattle, pigs, sheep and goats. Intensity of use of animal manure was calculated as the percentage of agricultural land on which manure was applied. The calculation included all types of animal manure applied using a variety of application techniques such as solid spreaders, liquid spreaders and irrigation systems.

Statistical and geographical analyses (a) Detection and identification of giardiasis clusters
A spatial scan statistic implemented in a software program, SaTScan, was used to test for the presence of giardiasis spatial clusters and to identify their approximate locations [15,[20][21][22][23][24]. The theory behind the spatial scan statistic is a generalization of a test proposed by Turnbull and coworkers [25]. The statistic uses a circular window of variable radius that moves across the map. The radius of the window varies from 0 up to a specified maximum value. As the window of the statistic moves across the map, it defines a set of different neighboring CSDs. If the window contains the centroid of a CSD, the whole CSD is included in the window [24]. The cluster assessment is performed by comparing the number of cases within the window with the number expected if cases are randomly distributed in space. The test of significance of the identified clusters is based on a likelihood ratio test [22] whose p-value is obtained through Monte Carlo testing [26].
Identification of spatial high or low rate clusters was done under the Poisson probability model assumption using a maximum spatial cluster size of 5% of the total population. For statistical inference, 999 Monte Carlo replications were performed. The null hypothesis of no clusters was rejected when the simulated p-value was less than or equal to 0.05 for the primary cluster and 0.1 for the secondary clusters since the latter have conservative p-values.
To get a better understanding of the disease distribution, the spatial distribution of the clusters was compared to the distribution of spatial empirical Bayesian smoothed giardiasis rates. The latter were chosen for comparison due to the fact that the census sub-divisions are small areas and therefore have unstable rates [7]. For details on how these were computed, please refer to an earlier article [7].

(b) Cartographic and GIS manipulations
All GIS manipulations and cartographic displays were performed in ArcView GIS [27]. The geographical distributions of agricultural and land-use factors including livestock densities and the percentage of agricultural land on which manure was applied were also mapped. Jenk's optimization classification method was used to determine the critical intervals for livestock density and manure-use maps [28]. This method identifies the critical intervals using a statistical formula which identifies groupings and patterns inherent in the data. Since it uses patterns inherent in the data, it minimizes the sum of the variance within each of the classes and therefore attains a goodness of variance fit of 0.91, and it maximizes information about both the map area and the parameter being mapped resulting in a very efficient map [28]. Visual comparisons of the spatial patterns in the distribution of the above potential risk factors and distribution of giardiasis clusters were then made.

(c) Geographical correlation analyses
Since data on livestock density and manure application on agricultural land were available at the Census Consolidated Sub-divisions (CCSD) spatial level, this geographical unit was used as the unit of analysis for the assessment of correlations between the land-use variables (livestock density and manure application on agricultural land) and giardiasis rates. As a first step, global Spearman's rank correlation coefficients between giardiasis rates and land-use factors (cattle and livestock densities as well as manure application) were calculated for all areas. For health planning purposes, the Ontario Ministry of Health and Long-Term Care has sub-divided the province into seven health planning regions. In the second step, Spearman's correlation analyses were repeated for each of these health planning regions in order to assess geographical differences in associations between giardiasis rates and the land-use factors.

Distribution of high rate spatial giardiasis clusters
A map showing the distribution of the counties in southern Ontario is presented in Figure 1 and the distribution of spatial empirical Bayesian smoothed rates of giardiasis is shown in Figure 2. Visual examination of Figures 1 and 2 reveals the presence of potential clusters in areas surrounding Georgian Bay, as well as in Waterloo, Wellington, Oxford, Lanark, Frontenac and Lennox and Addington counties. Formal cluster analysis identified several clusters with high giardiasis rates (Table 1 and Table 1 are not visible in the map ( Figure 3) due to their geographically small sizes.
Eight clusters with significantly low giardiasis rates were also identified ( Table 2 and Figure 4). The primary low rate cluster (cluster 1) was in Middlesex county. Cluster 2 was principally in York Regional Municipality whereas cluster 3 included parts of Halton and Peel counties. Cluster 8 included Stormont, Dundas and Glengarry United Counties, Prescott and Russell United Counties as well as Ottawa-Carleton Regional Municipality.

Distribution of livestock densities and manure use
The median cattle density per Consolidated Census Sub-Division (CCSD) was 2.27 (range: 0.21, 19.3) cattle per hectare of pasture land. The highest cattle density values (3.7 -19.3 cattle per hectare of pasture land) were observed in the counties of Waterloo Regional Municipality, Wellington, Perth and Oxford counties extending northwards to the counties bordering western Georgian Bay and Lake Huron (Huron, south Bruce and south Grey counties) (Figures 1 and 5). Similar cattle densities were also observed in Prescott and Russell United counties. High densities (2.5-3.7 cattle per hectare of pasture land) were also recorded for Peel Regional Municipality, Peterborough, Renfrew, and Stormont, Dundas and Glengarry United counties. The lowest cattle density values (0.2 -0.9 cattle per hectare of pasture land) were recorded in parts of Parry Sound, Muskoka, Haliburton, and a number of central -eastern counties. The spatial distributions of total livestock (cattle, sheep, pigs and goats) followed similar patterns as those of cattle densities with highest densities seen in Perth, Waterloo and the surrounding counties stretching northwards to Bruce county (figure not presented). To the south-west of the study area, only Essex county had low livestock densities. The median livestock density was 3.42 (range: 0.17, 87.9) animals per hectare of pasture land. The spatial distribution of the proportion of agricultural land on which manure was applied followed a similar pattern as the distribution of cattle and total livestock densities (figure not presented).

Correlations between giardiasis rates and land-use factors
Spearman correlation coefficients and their associated pvalues for relationships between giardiasis rates and landuse factors are shown in Table 3. When analyses included all geographical regions in the study area, significant but relatively low correlation coefficients were observed only between giardiasis rates and cattle density per hectare of agricultural land (r = 0.11; P = 0.007) and between  Figure 3 for the spatial distribution of the geographically large clusters b Rate ratio c Census Sub-divisions giardiasis rates and proportion of agricultural land under manure application (r = 0.09; P = 0.037). However, when analyses were performed for individual health planning regions ( Figure 6) in the study area, more variables were significantly associated with giardiasis rates in some health planning regions. For instance, there was significant correlation between giardiasis rates and cattle density per hectare of pasture land in the Central West Health Planning Region (r = 0.38; P = 0.013) implying that cattle density might explain approximately 14.4% of the variation in giardiasis distribution in that planning region. Significant but low correlation coefficients were also observed between giardiasis rates and the proportion of agricultural land on which manure was applied in both Distribution of counties or census division (CDs) of southern Ontario  (Table 3).

Spatial distribution of giardiasis clusters
The results suggest that there were 'hot-spots' of human giardiasis in a number of areas in southern Ontario. The distribution of the clusters in the Central-west and Southwest areas was consistent with a possible involvement of a common risk factor; namely livestock density. However, the distribution of giardiasis clusters in most of the other high risk areas did not coincide with the distribution of high livestock density and/or manure application suggesting that other modes of transmission might be more important in these areas. It is worth mentioning that although differential reporting biases across the study area can not be totally ruled out as a reason for some of the observed clusters, it is unlikely to be a major problem since reporting of giardiasis is mandatory in Ontario. Due to financial and time constraints this study did not assess reporting biases.
Since relatively low but significant correlation coefficients between giardiasis rates and both livestock densities and manure application were observed but in only two health planning regions (Table 3), it implies that these two factors may not play an important role in the overall epidemiology of giardiasis in southern Ontario. This does not  Table 1. Numerical identification of the clusters are in order of their likelihood ratio; the cluster with the highest likelihood ratio is cluster 1 (most likely cluster) while cluster 2 had the second highest likelihood ratio, etc. G e o r g i a n B a y necessarily contradict reports from other epidemiological studies that showed that cattle and other farm animals play a role in giardiasis transmission to humans [9,10,12]. It is possible that other modes of infection were more important than contamination of drinking water by livestock manure in certain areas of Ontario. For example, parts of cluster 3 were in areas surrounding Georgian Bay, a key holiday destination during the summer, suggesting either possible contamination of drinking water through increased human activities in watersheds or increased contact with water contaminated from other sources. It is also likely that other modes of transmission, such as contamination of water by wild animals and person-to-person transmission as occurs in day care centers, may be important in different local areas. It is worth noting that although other studies have reported epidemiological evidence implicating cattle and other farm animals in the transmission of giardiasis to humans [9,10,12], to our knowledge there has been no experimental evidence to confirm this.

The cluster detection (SaTScan) methodology
Some of the high rate clusters identified in this study were too large and may be unrealistic when compared to the distribution of the empirical Bayesian rates. This is most likely due to the fact that the borders of the clusters are uncertain and therefore need to be interpreted with caution. The uncertainty of the borders arises because often Spatial distribution of significant low rate giardiasis clusters in southern Ontario (1990-98) Figure 4 Spatial distribution of significant low rate giardiasis clusters in southern Ontario (1990-98). The numerical identification of the clusters are in order of their likelihood ratio; the cluster with the highest likelihood ratio is cluster 1 (most likely cluster) while cluster 2 had the second highest likelihood ratio, etc. For more detailed cluster information, refer to Table 2.  N there are many windows that overlap with the potential cluster. These windows may have only slightly lower disease rates and therefore it is probable that some of these areas get erroneously included in the cluster [23]. It is also possible that some secondary clusters in the neighborhood of the primary clusters might actually be parts of the primary cluster [24]. A study by Tango (2000) also reported unrealistically large cluster sizes after applying the procedure on several data sets where many clusters were evident [29]. He observed that this occurred more frequently in the presence of more than one cluster in the study region as opposed to when only one cluster was present. This suggests that the results of cluster analysis should be interpreted together with knowledge of the spatial distribution of rates, especially spatial empirical Bayesian rates [29][30][31]. It should also be borne in mind that the p-values of the secondary clusters are conservative [24] and therefore they under-estimate their true significance.
For this reason, the significance of the secondary clusters were assessed at the 10% significance level whereas the primary clusters were assessed at the 5% level.
Quite often public health authorities need to respond to demands to investigate potential clusters of different diseases and confirm or refute, with certainty, that a problem exists [32]. However, due to the complexity and cost of rigorous epidemiologic cluster studies, they are usually not able to thoroughly investigate all potential clusters. Therefore, using surveillance data and cluster investigation statistical methodologies, health officials would be able to identify statistically significant clusters and therefore prioritize the clusters that need thorough epidemiological investigations. Systematic use of cluster investigation techniques as part of regular surveillance activities would provide additional intelligence necessary to improve population health. Although we have used the Geographical distribution of health planning regions in southern Ontario N methodology on retrospective data, it can be applied on prospective data as well and would be a very useful tool for public health epidemiologists in disease surveillance. Moreover, the SaTScan software is free of charge and can be down-loaded from http://www.satscan.org. Lastly, as has been pointed out by Ward and Carpenter [33], publication of these studies will assist epidemiologists and statisticians to address a number of current methodological issues.

Conclusions
This study has shown the presence of 'hot-spots' of giardiasis in southern Ontario. The study has also demonstrated that using existing health data, GIS and spatial scan statistics could provide public health officials with additional tools necessary for disease surveillance. The low correlation coefficients between giardiasis rates and both livestock density and manure application in only two health regions implies that livestock density and manure use do not explain most of the higher rates of giardiasis reported in rural areas. More detailed individual level epidemiological investigations need to be carried out in the identified 'hot-spots' to identify the most important determinants of disease distribution and assess the burden of illness due to giardiasis.

Authors' contributions
AO was involved in the conceptualization, research design, execution and write-up of the first draft of the manuscript. SWM and PM collaborated in conceptualization and research design. DM organized and provided the disease data whereas JH, and JW contributed in the research design. All authors were involved in the preparation of the manuscript.