Maps of disease risk have a broad spectrum of applications in the health sciences. Disease maps can aid the diagnosis of individual cases by providing information about the likelihood of exposure to specific infectious agents [1]. Disease maps are also frequently used in regional assessments of public health. Spatial patterns of disease risk can be combined with other geographic datasets to identify and evaluate populations at risk [2], and to aid in predicting future disease outbreaks and epidemics [3, 4]. Although disease risk is defined as the probability of an individual contracting a disease within a specific time period [5], direct measurements of risk can be difficult to obtain, and disease maps are often based on presumed correlates of risk such as vector abundance, pathogen prevalence in a sentinel species, or disease frequency in human populations. Another challenge in developing disease maps is that the underlying data may be available at a limited number of isolated locations. This problem can be particularly acute for emerging infectious diseases, which are likely to be misdiagnosed and underreported, and in developing countries where surveillance may be limited or nonexistent. Therefore, it is often necessary to interpolate between isolated sample locations to generate a continuous surface of disease risk predictions.

One solution to this problem is to model disease risk as a function of one or more environmental variables. This approach is based on the assumption that the environment influences development and transmission of pathogens, habitats for disease vectors and hosts, or human exposure to pathogens. To be used in disease mapping, environmental data must be available as complete spatial coverages that allow model calibration at sites where disease data exist, and model-based predictions at other locations where disease data are unavailable. Climate is recognized as a major constraint on the geographic ranges of infectious diseases, and interpolated climate datasets have been used to predict the distributions of tick vectors in the United States [6], Europe [7], and southern Africa [8]. Spatial variability in land cover, soils, and geology also affect habitat suitability for vector species, and these variables have been used to predict the spatial pattern of habitat suitability for *Ixodes scapularis* in the north-central United States [9]. Spectral indices derived from satellite imagery provide information about environmental characteristics such as vegetation cover, moisture, and temperature, and have been used to develop disease risk maps ranging from landscape patterns of tick habitat suitability [10] to the distribution of malaria across Africa [11].

Spatial autocorrelation is an important statistical consideration in the development of predictive models of disease risk. Sites located close to one another tend to have similar disease risk because they share similar environments and are connected via communicable disease spread or vector and host dispersal. Ordinary least squares regression, generalized linear models, and other standard statistical modelling methods assume that any spatial pattern in the response variable can be entirely explained by the set of predictor variables, and that model residuals are independent and identically distributed [5]. Problems with spatial autocorrelation can arise when there are relevant environmental predictors that have not been included in the model, or when disease patterns are affected by dispersal limitations as well as the environment. Failure to fully account for spatial autocorrelation results in biased estimates of the coefficients and their standard errors, which in turn affect model predictions and statistical tests on the coefficients [12].

Despite these challenges, spatial autocorrelation also presents opportunities for improving model predictions when the association between disease risk and the available environmental data is weak. Put simply, if disease risk exhibits some degree of spatial clustering, a location surrounded by sites with high disease risk would be expected to have a high disease risk, and a location surrounded by sites with low disease risk would be expected to have a low disease risk. Spatial interpolation based on associations with neighbouring sites can be implemented using a variety of statistical techniques. A study of the tick-borne pathogen *Ehrlichia chaffeensis* in the southern U.S. found that spatial interpolation based on indicator kriging outperformed logistic regression models based on environmental variables [13]. Predictive mapping studies of tick distributions have applied methods such as co-kriging [14], and autologistic regression [6] to combine information about environmental relationships with spatial autocorrelation in a predictive framework.

Another consideration in developing disease risk models is the phenomenon of spatial heterogeneity [15] (also referred to as spatial non-stationarity [16]), which occurs when the influences of environmental variables on disease risk are not uniform across the region of interest. For example, sub-regional logistic regression models provided evidence of geographically varying environmental constraints on the distribution of *E. chaffeensis* and yielded more accurate predictions of pathogen presence than a single model fitted for the entire region [13]. Similarly, the relationship between climate and the distribution of *Ixodes ricinus* in Europe was found to vary across ecoregions [7]. Statistical techniques such as geographically weighted regression (GWR) [16] have been developed specifically to analyze the spatial variability of regression parameters, but have only recently been applied to analyze spatial patterns of disease risk [17–19]. The implication for spatial modelling is that if there is indeed spatial variability in the relationships between disease risk and environmental variables, models that explicitly account for this heterogeneity are likely to yield more accurate predictions

This study compared alternative methods for developing predictive maps of the geographic distributions of two tick-borne pathogens in the southern United States. *Ehrlichia chaffeensis*, the causative agent of human monocytotropic ehrlichiosis, is transmitted by *Amblyomma americanum* (lone-star tick). *Anaplasma phagocytophilum*, the causative agent of human granulocytotropic anaplasmosis (previously called HGE agent), is transmitted by *Ixodes scapularis* (black-legged tick). *E. chaffeensis* is maintained in a zoonotic cycle that includes white-tailed deer (*Odocoileus virgnianus*) as a keystone host for larval, nymph, and adult *A. americanum* [20] and the primary reservoir for *E. chaffeensis* [21]. In contrast, *A. phagocytophilum* is maintained in a zoonotic cycle in which white-tailed deer are a primary hosts for adult *I. scapularis*, but additional bird and mammal species are required to serve as hosts for the larval and nymph stages [22]. The white-footed mouse, *Peromyscus leucopus*, is a particularly important host for immature *I. scapularis* in the eastern United States and is also a competent reservoir for *A. phagocytophilum* [23]. In general, *A. americanum* is more tolerant of desiccation than *I. scapularis* and can occupy more exposed microsites and remain active at lower humidity [24, 25].

Although a variety of methods have been proposed for improving predictive spatial models by incorporating spatial autocorrelation or spatial heterogeneity into environmental models, there have been no comparative assessments of the accuracy that is gained by applying these more complex approaches in disease risk mapping. The main goal of this research was to determine whether incorporating spatial autocorrelation and spatial heterogeneity would improve environmental predictions of the geographic distributions of *E. chaffeensis* and *A. phagocytophilum*. A further goal was to determine whether the modelling strategies that were most effective for each pathogen reflected differences in the underlying host relationships and vector ecology.