- Methodology
- Open Access

# Application of Poisson kriging to the mapping of cholera and dysentery incidence in an endemic area of Bangladesh

- Mohammad Ali
^{1}Email author, - Pierre Goovaerts
^{2}, - Nushrat Nazia
^{3}, - M Zahirul Haq
^{4}, - Mohammad Yunus
^{4}and - Michael Emch
^{5}

**5**:45

https://doi.org/10.1186/1476-072X-5-45

© Ali et al; licensee BioMed Central Ltd. 2006

**Received:**07 August 2006**Accepted:**13 October 2006**Published:**13 October 2006

## Abstract

### Background

Disease maps can serve to display incidence rates geographically, to inform on public health provision about the success or failure of interventions, and to make hypothesis or to provide evidences concerning disease etiology. Poisson kriging was recently introduced to filter the noise attached to rates recorded over sparsely populated administrative units. Its benefit over simple population-weighted averages and empirical Bayesian smoothers was demonstrated by simulation studies using county-level cancer mortality rates. This paper presents the first application of Poisson kriging to the spatial interpolation of local disease rates, resulting in continuous maps of disease rate estimates and the associated prediction variance. The methodology is illustrated using cholera and dysentery data collected in a cholera endemic area (Matlab) of Bangladesh.

### Results

The spatial analysis was confined to patrilineally-related clusters of households, known as *baris*, located within 9 kilometers from the Matlab hospital to avoid underestimating the risk of disease incidence, since patients far away from the medical facilities are less likely to travel. Semivariogram models reveal a range of autocorrelation of 1.1 km for dysentery and 0.37 km for cholera. This result translates into a cholera risk map that is patchier than the dysentery map that shows a large zone of high incidence in the south-central part of the study area, which is quasi-urban. On both maps, lower risk values are found in the Northern part of the study area, which is also the most distant from the Matlab hospital. The weaker spatial continuity of cholera versus dysentery incidence rates resulted in larger kriging variance across the study area.

### Conclusion

The approach presented in this paper enables researchers to incorporate the pattern of spatial dependence of incidence rates into the mapping of risk values and the quantification of the associated uncertainty. Differences in spatial patterns, in particular the range of spatial autocorrelation, reflect differences in the mode of transmission of cholera and dysentery. Our risk maps for cholera and dysentery incidences should help identifying putative factors of increased disease incidence, leading to more effective prevention and remedial actions in endemic areas.

## Keywords

- Cholera
- Dysentery
- Semivariogram Model
- Acute Watery Diarrhea
- Cholera Incidence

## Background

Disease maps can serve to display incidence rates geographically, to inform on public health provision about the success or failure of interventions, and to make hypothesis or to provide evidences concerning disease etiology [1]. Understanding social/human ecological influence on health may help trigger an effective mechanism to reduce the risk of the disease. However, making a perfect disease risk map is challenging. Recent reviews of disease mapping have been provided by Kulldroff [2], Bithel [3], Diggle [4], Lawson [5], and Lawson and Clark [6]. Most of the proposed methods to create disease maps were tested on data aggregated within large geographical units and applied on non-infectious (cancer-related) diseases. Since ecological process in a large geographical unit would eventually be seen as flat, associating it with the spatial pattern of diseases may yield spurious outcomes [7].

Cholera and other diarrheal diseases remain a problem in many developing countries [8–11]. Endemic cholera has displayed a particular geographical pattern in many nations of the world [11–16]. Although several geographic studies were conducted for studying disease risk, only a few studies were concerned with risk in endemic areas that creates a constant public health problem. Identifying zones of elevated risk in endemic areas and studying socioecological conditions of these zones may provide useful clues for controlling the disease burden and improving the endemic situation. However, assessing risk for diarrheal diseases requires knowledge not only of environment but also of individual's lifestyle and health behavior that may influence the disease risk. Emch [17] illustrated that diarrheal disease risk is a dynamic interaction of biological, socioeconomic, behavioral, cultural and environmental factors over time and space. Heterogeneity in these factors among individuals may influence normal process of the disease phenomena.

Since cholera and other diarrheal diseases are primarily driven by environmental factors [18], and since environmental processes are spatially continuous in nature [19], these disease rates should display some spatial pattern. In other words, rates measured in households that are close geographically should be more similar than the rates recorded over larger separation distances. Geostatistics provides tools to analyze spatially distributed data, capitalizing on the correlation between observations to interpolate the attribute of interest and to delineate areas with high disease risk [1, 19, 20]. Traditional geostatistical methods are, however, not suited to the analysis of disease rates since they ignore the fact that rate data consist of a denominator and a numerator, which means that observations have varying degree of reliability that should be incorporated into the analysis. Goovaerts [21, 22] recently introduced a methodology that is based on Poisson kriging and semivariogram estimators developed by Monestiez et al. [23] for mapping the relative abundance of species in the presence of spatially heterogeneous observation efforts and sparse animal sightings. Simulation studies indicated that this approach outperforms simple population-weighted averages and empirical Bayesian smoothers to estimate underlying risk in cancer mortality maps. The analysis was however confined to rate data aggregated to the level of large geographical entities (i.e. counties) that covered the entire study area, thereby eliminating the need for predicting disease risks at unmonitored locations. The creation of continuous risk maps was addressed in Monestiez et al. [23] with an application to the mapping of the expected number of whale sightings per hour.

This paper presents the first application of Poisson kriging to the spatial interpolation of local disease rates. Cholera and dysentery data were collected at the household level in an endemic area of Bangladesh. Both the disease risk and associated measure of reliability are mapped using geostatistics. These results allow then the mapping of the probability that locally the incidence rate differs from the average rate recorded over the region.

## Methods

### The study area

The study was carried out in Matlab, a field research station of the ICDDR,B (International Centre for Diarrhoeal Disease Research, Bangladesh): Centre for Health and Population Research, Bangladesh. Matlab is situated near the lower Meghna River, one of the main rivers of Bangladesh, and is approximately 50 km southeast of Dhaka (the capital of Bangladesh). The field research station has been involved with cholera studies since 1963. Approximately 200,000 people live in the 184 square kilometers study area. Currently, there are 142 villages; 128 are predominantly Muslim, the rest are predominantly Hindu. The study area is almost entirely rural and most people's occupations are in agriculture or fisheries. The monsoon climate of the study area is characterized by high temperatures, heavy rainfall and marked seasonal variation [24, 25].

### Population and disease data

In 1994, a GIS database for the study area was created in order to facilitate spatial analysis for health and population studies. The database includes *baris*, flood-control embankments, rivers, village boundaries, and health facilities. *Baris* are patrilineally-related clusters of households. An average of five households lived in a *bari*. The details on how the geographic database was created can be found elsewhere [26]. This database facilitates the linking of population and disease surveillance data to *bari* locations. The ICDDR,B's Health and Demographic Surveillance Systems (HDSS) databases are maintained at the individual level, thus, the data are aggregated to the *baris*-level for spatial analysis. There were 7,181 points representing *baris* and 207,576 individuals in the study area in 1994. The number of individuals living in a *bari* ranges from 1 to 469, with a median number of 19.

The population data were obtained from the HDSS database. Initiated in 1966, the HDSS maintains records of all vital demographic events [27–29]. The residents in the study area are identified by a registration number given in the surveillance system. To identify the household locations of the individuals, the HDSS maintains a *bari* identification, which is unique and a permanent number in the system.

The disease data were obtained from the Matlab hospital surveillance system, which has in- and out-patient services as well as a laboratory for the identification of pathogens. The Matlab hospital treats approximately 7,000 to 8,000 diarrhea cases per year at no cost to patients. The hospital maintains motorized boats, which function as a free ambulance service for diarrhea patients; access to the hospital is excellent throughout the study area. Stool samples are collected from all patients who are admitted to the hospital who live in the ICDDR,B area. All stool samples are screened for enteric pathogens in the laboratory.

*V. cholerea*isolated from subject's stool. A dysentery case was defined as the presence of blood in stool and sought for medical treatment at the Matlab hospital. The spatial distribution of cholera and dysentery cases is shown in Figure 1.

### Disease mapping

Disease mapping requires the interpolation of bari-level rate data to the nodes of a grid covering the study area, in this case a grid with 100 meters spacing. Although spatial interpolation methods, such as kriging, are now fairly common, they cannot be applied directly to disease data since they ignore the population size underlying the computation of rate data. In other words, rates computed from small population sizes (e.g. half of the baris include no more than 19 individuals) tend to be less reliable than rates computed in densely populated areas, and this effect should be incorporated into the estimation algorithm. Poisson kriging allows one to account for population size in the analysis of disease data. The risk of contracting a disease at a given location **u**_{
α
}, denoted r(**u**_{
α
}), is estimated as a linear combination of K neighboring data:

where z(**u**_{i}) is the rate observed at location **u**_{i}. The kriging weights are found by solving the following system of (K+1) linear equations:

where *δ*_{ij} = 1 if **u**_{i} = **u**_{j} and 0 otherwise, n(**u**_{i}) is the population size at **u**_{i}, and *m** is the population-weighted mean of the set of N rates computed as:

The quantity *m**/*n*(**u**_{i}) is an error variance term that represents the variability arising from population size and is derived directly under the Poisson model for the counts [21]. The incorporation of this term for a zero distance (i.e. **u**_{i} = **u**_{j}) leads one to assign smaller kriging weights to rates that are computed from smaller populations and deemed less reliable. The term *μ*(**u**_{
α
}) is a Lagrange parameter that results from the minimization of the estimation variance subject to the unbiasedness constraint on the estimator.

To solve the kriging system (Equation 2), one needs to have a model of the spatial covariance of the unknown risk, *C*_{
R
}(**h**), or equivalently its semivariogram *γ*_{
R
}(**h**) = *C*_{
R
}(0)-*C*_{
R
}(**h**). The experimental semivariogram of the risk is computed using the following estimator developed by Monestiez et al. [23]:

where N(**h**) is the number of pairs of baris separated by a vector **h**. The different spatial increments [*z*(**u**_{
α
})-*z*(**u**_{
α
}+ **h**)]^{2} are weighted by a function of their respective population sizes, *n*(**u**_{
α
})*n*(**u**_{
α
}+ **h**)/(*n*(**u**_{
α
}) + *n*(**u**_{
α
}+ **h**)), a term which is inversely proportional to their standard deviation. More importance is thus given to the more reliable data pairs (i.e. smaller standard deviations). A permissible model, *γ* _{
R
}(**h**), is then fitted to the experimental semivariogram in order to derive the semivariogram, or covariance value, for any possible distance h. In this paper, the modeling was conducted using the weighted least-square regression procedure implemented in the public-domain software **poisson_kriging.exe** [21].

A probabilistic model of the uncertainty about the risk of incidence of the disease at a given location **u**_{
α
}, denoted r(**u**_{
α
}), is provided by the random variable *R*(**u**_{
α
}). This variable is fully characterized by its conditional cumulative distribution function (ccdf) defined as:

where *G*(.) is the cumulative distribution function of the standard normal random variable. The notation "|(*K*)" expresses conditioning to the local information, say, *K* neighboring observed rates. The ccdf is modeled as a Gaussian distribution [22] with the mean and variance corresponding to the Poisson kriging estimate (Equation 1) and the kriging variance that is computed as:

The ccdf (Equation (5) allows the computation and mapping of the probability that the local incidence rate exceeds any particular threshold. In absence of regulatory threshold in the present case study, this function is here used to assess whether the local incidence rate at **u**_{
α
}is significantly different from the regional rate m* (Equation 3). The *p*-value for the test is calculated as:

Because the test is repeated over a large number of grid node **u**_{
α
}, some locations are expected to be tested significant even if there is no significant differences present (i.e. multiple testing). Rather than using the very conservative Bonferroni correction, *p*-values were here simply used to rank locations according to their likelihood of having much smaller or larger incidence rates without any formal classification as significant locations or not.

## Results and discussion

**h**

_{l}) = /

*γ*(

**h**

_{l}), in the program

**poisson_kriging.exe**).

The cholera semivariogram model is exponential with a practical range of 370 meters, while a spherical model with a range of 1,100 meters was fitted for dysentery. The larger range of autocorrelation for the second model indicates a better spatial structure of dysentery incidence rates, which can be explained by the differences in the mode of transmission for the two diseases. Cholera is usually transmitted to a person through oral-fecal-oral process, while dysentery can be spread through hand-to-hand contacts. In the past there have been a number of interventions carried out in the study area to reduce the risk of cholera, while dysentery received much less attention. Therefore, local differences in the socioeconomic condition and health behavior among neighboring households due to health interventions may explain the shorter range of spatial autocorrelation displayed by cholera incidence rates relatively to dysentery.

Descriptive statistics of the distribution of incidence rates and kriged risks for cholera and dysentery in Matlab, 1994.

Statistics | Cholera | Dysentery | ||
---|---|---|---|---|

Kriged risk | Observed rate | Kriged risk | Observed rate | |

N | 13689 | 6017 | 13689 | 6017 |

Mean | 3.7238 | 3.8512 | 2.6841 | 2.7732 |

Std. Deviation | 3.5083 | 20.134 | 2.4847 | 14.776 |

Minimum | 0.00 | 0.00 | 0.00 | 0.00 |

Maximum | 40.93 | 500.00 | 15.92 | 250 |

## Conclusion

In Matlab, disease maps have never been used to isolate areas of high cholera or dysentery incidence and to guide for control activities. The health authority of the area could however greatly benefit from the availability of such maps for their disease control programs. Spatial interpolation of incidence rates can be performed using a variety of simple techniques, including inverse square distance methods, spline or moving average. The main limitation of these deterministic methods is that they fail to provide a measure of the reliability of the prediction. Kriging is increasingly preferred because it accounts for spatial patterns and data configuration (i.e. clustering of observations) in the prediction, while providing both an estimate and a measure of the error variance. The kriging variance can be combined with the kriging estimate to compute confidence intervals or the probability of exceeding specific thresholds, under the assumption of Gaussianity.

Interpolation of bari-level data cannot be performed using conventional geostatistical tools that ignore the population size underlying the computation of incidence rates. Because the number of individuals living in bari greatly fluctuates across the study area, local incidence rates have various degrees of reliability that need to be accounted for in the variography and spatial interpolation. Poisson kriging, which was introduced in the health science literature for filtering the noise in cancer mortality maps, is here used to create continuous maps of incidence risks from punctual bari-level data.

Unlike traditional semivariogram estimators, the version used in this paper directly accounts for population size in the computation of the semivariograms, attenuating the influence of less reliable rates recorded in sparsely populated areas. Cholera and dysentery display semivariograms with very different ranges of autocorrelation. The much larger range (1.1 km versus 0.370 km) observed for dysentery indicates that its incidence rates vary more continuously in space than cholera rates across the study area.

The maps of disease risk illustrated the better spatial continuity of rates recorded for dysentery relatively to cholera. Combined with the different location of high-incidence areas, these contrasted spatial patterns suggest that different mechanisms are responsible for the occurrence of these two diseases. Within the field of spatial epidemiology, knowledge about host, agent and environment is needed to get insight into the etiology of the disease and to develop and promote disease control measures [30]. This means that spatial epidemiological studies have to characterize spatial patterns of disease risk. Spatial variations of disease risk results in clustering of disease, and that may reflect the existence of specific subpopulations that are under higher or lower exposure to the disease. Our risk maps for cholera and dysentery may be useful for identifying putative factors of increased disease risk and for helping health officials take remedial actions.

The high incidence rates for dysentery in the urban part of the study area can be attributed to food habit pattern given that habitants of the study area use their hands to eat. Those who live close to urban areas have easy access to restaurants which tend to ignore basic hygienic rules. Thus, eating meals that are not served hygienically at those places increases the risk of acquiring dysentery. Our dysentery risk map suggests the need of interventions for combating the threats of dysentery in the urban part of the study area. On the other hand, the isolated clusters of high cholera incidence may be attributed to the poor health behavior (e.g. hands are not washed systematically after bathroom break and before eating, drinking water is not boiled or filtered, cooking utensils are washed with surface water, etc.) of the community. We can not also ignore the possibility of environmental niche of endemic cholera in those areas. Further investigations focusing on the areas of high incidence risk for cholera could shed some light on the factors that cause these local spikes in cholera incidence.

## Declarations

### Acknowledgements

The data used in this paper were collected with the support of ICDDR,B: Centre for Health and Population Research and its donors who provide unrestricted support to the Centre for its operations and research. Current donors providing unrestricted support include: Australian International Development Agency (AusAID), Government of Bangladesh, Canadian International Development Agency (CIDA), The Kingdom of Saudi Arabia (KSA), Government of the Netherlands, Government of Sri Lanka, Swedish International Development Cooperative Agency (SIDA), Swiss Development Cooperation (SDC) and Department for International Development, UK (DFID). We gratefully acknowledge these donors for their support and commitment to the Centre's research efforts.

## Authors’ Affiliations

## References

- Kelsall J, Wakefield J: Modeling spatial variation in disease risk: a geostatistical approach. Journal of the American Statistical Association. 2002, 97 (459): 692-701. 10.1198/016214502388618438.View ArticleGoogle Scholar
- Kulldroff M: A spatial scan statistics. Communications in Statistics: Theory and Methods. 1997, 26: 1481-1496.View ArticleGoogle Scholar
- Bithell JF: A classification of disease mapping methods. Stat Med. 2000, 19 (17–18): 2203-2215. 10.1002/1097-0258(20000915/30)19:17/18<2203::AID-SIM564>3.0.CO;2-U.PubMedView ArticleGoogle Scholar
- Diggle PJ: Overview of statistical methods for disease mapping and its relationship to cluster detection. In Spatial Epidemiology: Methods and Applications. Edited by: Elliot P, Wakefield JC, Best NG, Briggs DJ. 2000, Oxford: Oxford University PressGoogle Scholar
- Lawson AB: Disease map reconstruction. Stat Med. 2001, 20 (14): 2183-204. 10.1002/sim.933.PubMedView ArticleGoogle Scholar
- Lawson AB, Clark A: Spatial mixture relative risk models applied to disease mapping. Stat Med. 2002, 21 (3): 359-70. 10.1002/sim.1022.PubMedView ArticleGoogle Scholar
- Wakefield J, Salway R: A statistical framework for ecological and aggregate studies. Journal of Royal Statistical Society, Series A. 2001, 164: 119-137.View ArticleGoogle Scholar
- Gunnlaugsson G, Angulo FJ, Einarsdottir J, Passa A, Tauxe RV: Epidemic cholera in Guinea-Bissau: the challenge of preventing deaths in rural West Africa. Int J Infect Dis. 2000, 4 (1): 8-13. 10.1016/S1201-9712(00)90059-6.PubMedView ArticleGoogle Scholar
- La Raja M: Cholera-like diarrhoeal disease and rivers in rural Mozambique. Trop Doct. 2000, 30 (1): 53-4.PubMedGoogle Scholar
- Germani Y, Quilici ML, Glaziou P, Mattera D, Morvan J, Fournier JM: Emergence of cholera in the Central African Republic. Eur J Clin Microbiol Infect Dis. 1998, 17 (12): 888-890. 10.1007/s100960050217.PubMedView ArticleGoogle Scholar
- Glass RI, Becker S, Huq MI, Stoll BJ, Khan MU, Merson MH, Lee JV, Black RE: Endemic cholera in rural Bangladesh, 1966–1980. Am J Epidemiol. 1982, 116 (6): 959-70.PubMedGoogle Scholar
- Burrows W: Textbook of Microbiology. 1979, Philadelphia: SaundersGoogle Scholar
- Glass RI, Black RE: The epidemiology of cholera. Cholera. Edited by: Barua D, Greenough III W. 1992, New York: Plenum Medical Company, 129-54.View ArticleGoogle Scholar
- World Health Organization: Cholera today. A new look at an old disease. WHO Features. 1987, 107: 1-3.Google Scholar
- Blake PA: Cholera: a possible endemic focus in the United States. N Engl J Med. 1980, 302: 305-09.PubMedView ArticleGoogle Scholar
- Salmaso S, Greco D, Bonfiglio B, Castellani-Pastoris M, De Felip G, Bracciotti A, Sitzia G, Congiu A, Piu G, Angioni G, Barra L, Zampieri A, Baine WB: Recurrence of Pelecypod-associated cholera in Sardinia. Lancet. 1980, 2 (8204): 1124-1127. 10.1016/S0140-6736(80)92553-2.PubMedView ArticleGoogle Scholar
- Emch M: Diarrheal disease risk in Matlab, Bangladesh. Social Science & Medicine. 1999, 49: 519-530. 10.1016/S0277-9536(99)00146-X.View ArticleGoogle Scholar
- Huq A, Sack RB, Nizam A, Longini IM, Nair GB, Ali A, Morris GJ, Khan MNH, Siddique AK, Yunus M, Albert MJ, Sack DA, Colwell RR: Critical Factors Influencing the Occurrence of Vibrio cholerae in the Environment of Bangladesh. Appl Envir Microbiol. 2005, 71 (8): 4645-4654. 10.1128/AEM.71.8.4645-4654.2005.View ArticleGoogle Scholar
- Webster R, Oliver MA, Munir KR, Mann JR: Kriging the local risk of a rare disease from a register of diagnoses. Geographical Analysis. 1994, 26 (2): 168-185.View ArticleGoogle Scholar
- Carrat F, Valleron AJ: Epidemiologic mapping using the "kriging" method: application to an influenza-like illness epidemic in France. Am J Epidemiol. 1992, 135 (11): 1293-1300.PubMedGoogle Scholar
- Goovaerts P: Geostatistical analysis of disease data: estimation of cancer mortality risk from empirical frequencies using Poisson kriging. Int J Health Geogr. 2005, 4: 31-10.1186/1476-072X-4-31.PubMedPubMed CentralView ArticleGoogle Scholar
- Goovaerts P: Geostatistical analysis of disease data: visualization and propagation of spatial uncertainty in cancer mortality risk using Poisson kriging and p-field simulation. Int J Health Geogr. 2006, 5: 7-10.1186/1476-072X-5-7.PubMedPubMed CentralView ArticleGoogle Scholar
- Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Geostatistical modelling of spatial distribution of Balenoptera physalus in the northwestern Mediterranean Sea from sparse count data and heterogeneous observation efforts. Ecological Modelling. 2006, 193: 615-628. 10.1016/j.ecolmodel.2005.08.042.View ArticleGoogle Scholar
- Rashid ER: Geography of Bangladesh. 1991, University Press Limited, Dhaka, BangladeshGoogle Scholar
- Hall AJ: An evaluation of the effectiveness and impact of Bangladesh's Chandpur Flood Control and Irrigation Project. Doctoral Thesis. 1988, Department of Agriculture, University College of North Wales, Bangor, United KingdomGoogle Scholar
- Ali M, Emch M, Ashley C, Streatfield PK: Implementation of a medical geographic information system: concepts and uses. J Health Popul Nutr. 2001, 19 (2): 100-110.PubMedGoogle Scholar
- Koenig MA, Fauveau V, Chowdhury AI, Chakraborty J, Khan MA: Maternal mortality in Matlab, Bangladesh: 1976–85. Studies in Family Planning. 1988, 19 (2): 69-80. 10.2307/1966492.PubMedView ArticleGoogle Scholar
- D'Souza S: A population laboratory for studying disease processes and mortality-the demographic surveillance system, Matlab, Bangladesh. Special Publication No. 13, ICDDR,B, Dhaka, Bangladesh. 1981Google Scholar
- Phillips JF, Simmons R, Koneig MA, Chakraborty J: Determinants of reproductive change in a traditional society: evidence from Matlab, Bangladesh. Studies in Family Planning. 1988, 19 (6): 313-334. 10.2307/1966627.PubMedView ArticleGoogle Scholar
- Berke O: Exploratory spatial relative risk mapping. Preventive Veterinary Medicine. 2005, 71: 173-182. 10.1016/j.prevetmed.2005.07.003.PubMedView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.