Diarrheal disease risk in rural Bangladesh decreases as tubewell density increases: a zero-inflated and geographically weighted analysis
© Carrel et al; licensee BioMed Central Ltd. 2011
Received: 6 January 2011
Accepted: 15 June 2011
Published: 15 June 2011
This study investigates the impact of tubewell user density on cholera and shigellosis events in Matlab, Bangladesh between 2002 and 2004. Household-level demographic, health, and water infrastructure data were incorporated into a local geographic information systems (GIS) database. Geographically-weighted regression (GWR) models were constructed to identify spatial variation of relationships across the study area. Zero-inflated negative binomial regression models were run to simultaneously measure the likelihood of increased magnitude of disease events and the likelihood of zero cholera or shigellosis events. The aim of this study was to examine the effect of tubewell density on both the occurrence of diarrheal disease and the magnitude of diarrheal disease incidence.
In Matlab, households with greater tubewell density were more likely to report zero cholera or shigellosis events. Results for both cholera and shigellosis GWR models suggest that tubewell density effects are spatially stationary and the use of non-spatial statistical methods is appropriate.
Increasing the amount of drinking water available to households through increased density of tubewells contributed to lower reports of cholera and shigellosis events in rural Bangladesh. Our findings demonstrate the importance of tubewell installation and access to groundwater in reducing diarrheal disease events in the developing world.
Matlab, Bangladesh is a rural area located approximately 50 kilometers southeast of the capital city, Dhaka. Diarrheal diseases are endemic in Matlab and across Bangladesh. This is due to a number of factors, including natural aquatic environments of diarrheal causative agents, high population density, low socioeconomic status, and limited access to clean water. The installation of tubewells is a primary method for decreasing diarrheal disease incidence, giving Bangladeshis an alternative to drinking contaminated surface water. These tubewells draw water several hundred feet from the underlying aquifer to the surface by hand pumps. A sanitary seal, typically of concrete, ideally prevents seepage of ground water into the tubewell. Despite their sealed nature, tubewells may still harbor microorganisms, such as fecal bacteria, as a result of their proximity to latrines or contaminated surface water bodies .
Despite such possible contamination of tubewell water, previous studies of diarrheal disease and drinking water interactions have suggested that it is not simply the use of alternative groundwater sources that provides protection against infection, but also the quantity and accessibility [2–7]. Put simply, the density of a population sharing a tubewell can lower the protective effect conferred by a water supply. Our study examines the potential impacts of tubewell access on two distinct types of diarrheal disease events among Matlab's residents between 2002 and 2004. In doing so, we address two problems commonly encountered and often overlooked in disease analysis: spatial variation in relationships and a small number of reported events. Geographically weighted regression is used to assess potential spatial variability in relationships across the study area and zero-inflated models to control for low disease counts. By utilizing both methods, we are able to ascertain the impact of tubewell access on cholera and shigellosis patterns without the confounding effects of spatial non-stationarity or rare event reporting.
Cholera and shigellosis are two of the most commonly experienced diarrheal diseases in Matlab. Cholera is a disease caused by ingesting a large number of Vibrio cholerae bacteria; the infective dose is 10,000 or greater. Shigellosis is caused by ingesting a much smaller dose of Shigella bacteria; sometimes as few as ten bacteria are necessary to cause an infection. The primary symptom of cholera is watery diarrhea, while the primary symptom of shigellosis is bloody diarrhea (dysentery). Cholera bacteria naturally occur in the brackish waters of Bangladesh, while Shigella bacteria are linked to human waste contamination of water supplies. We hypothesized that, despite these described differences between the two diseases, tubewell user density would be predictive of diarrheal events for both.
ICDDR,B operates a hospital in southwest Matlab that specializes in high-quality treatment of diarrheal diseases, respiratory infections, and maternal and child health. Treatment at the ICDDR,B is free for residents of Matlab, and free transportation to the hospital is available for residents who cannot afford travel. The laboratory diagnoses and treatment outcomes of patients are linked to their demographic and health records via the resident's identification number.
The health and demographic surveillance data collected by ICDDR,B is linked to the spatial locations of residence via a Geographic Information System (GIS). The latitude and longitude coordinates collected at the center of each bari are geocoded in the GIS, as are the coordinates of other geographic features, such as the ICDDR,B hospital. The location of features in the Matlab GIS are accurate within 10 meters . The spatially-joined environmental and demographic surveillance data allows for the calculation of a variety of measures which are described in further detail below.
Annual incidence rates (per 1000 people) for cholera and shigellosis in Matlab, Bangladesh, 2002-2004
The ICDDR,B Matlab hospital is not located in the geographic center of the study area, and varying distance decay effects on patients seeking treatment have been recorded [9–13]. Information regarding residents seeking treatment outside of the study area is currently unavailable. To account for this influence of distance from household to hospital on reporting of cholera and shigellosis, we calculated in the GIS the distance from each bari to the ICDDR,B hospital and controlled for it in our models. Straight-line Euclidean distance between baris and the hospital was calculated, given that residents of Matlab make use of both road and river networks to travel to the hospital. Because no specific route was known for individual baris, a universal distance metric was chosen.
Traditional regression models assume that the process accounting for disease incidence is spatially stationary across the study area. This assumption, however, contradicts the goals of spatially-explicit studies whose aim is to understand where and why an outcome differs across a study area. Geographically weighted regression (GWR) models can be used to generate hypotheses about non-stationary relationships as well as pinpoint locations that should be subject to more intensive research [14, 15].
GWR produces two types of models: a global regression model that treats relationships as spatially stationary, and local regression models estimated at each observation point. Similar to traditional regression models, GWR estimates a model of global associations with parameter estimates, standard errors and t-values. Local regression models are estimated by constructing a spatial weighting matrix and estimating a local weighted regression for each bari and its defined neighbors. Observations are weighted based on proximity to each location i such that nearer points are assigned greater weight than observations further away, based upon the geographic assumption of spatial autocorrelation. A regression estimate is calculated at each observation, including R2 and other goodness-of-fit measures . A Monte Carlo simulation approach is used to measure significance of local parameters. The variance of observed model parameters is compared against 100 random calibrations of the same model, providing t-statistics of significance. The local parameter estimates are specific to each location in both logistic and Gaussian GWR and can be displayed graphically using GIS software, indicating potential spatially-heterogeneous patterns in the relationships between diarrheal disease incidence and variables of interest.
Prior to constructing GWR models, the 8732 baris were aggregated into 666 clusters using a 25 × 25 meter grid in ArcGIS software. The grid was constructed to create a smooth, continuous surface where each cell has unique coordinates that can be analyzed using GWR. While each bari in Matlab is associated with a specific set of geographic coordinates collected at the center of the bari, the coordinates represent points rather than polygons, their dense spatial configuration made conducting GWR without construction of the 25 × 25 meter grid impossible. Model specifications include a logistic distribution in the absence of a negative binomial option to account for low incidence values and overdispersion, and an adaptive spatial kernel function to assign weights for each local regression estimate. The kernel bandwidth was determined by corrected Akaike Information Criterion (AICc) minimization using all data. Adaptive kernel sample size limits were between 166 and 666 bari clusters. We selected a Monte Carlo significance testing procedure for the local parameter estimates. The four variables calculated in the GIS were then analyzed against the aggregated counts of cholera and shigellosis cases reported to ICDDR,B: number of tubewells per bari, total bari population, distance to the Matlab hospital, and the mean tubewell depth per bari. Specifically, the aim of constructing geographically weighted regression models was to determine whether tubewell density effects on diarrheal events vary across space, when controlling for other covariates. Local variation would suggest that tubewell access intervention strategies are effective in specific areas within Matlab while a lack of spatial variation would indicate equal effects of tubewell access throughout Matlab.
Many disease datasets contain a large number of zeroes, violating standard regression distribution assumptions and making Poisson or negative binomial regression problematic [16–19]. Zero inflated models (including both zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB)) account for this overabundance of zero counts and eliminate bias in parameter estimates, and are increasingly used in health studies [11, 20, 21]. Of the 8732 baris in Matlab from 2002 to 2004, 8278 (94.8%) reported zero cholera events and 8308 (95.1%) reported zero shigellosis events. A zero inflated model assumes that these zero counts are made up of both true and false zeroes. In the case of the Matlab dataset, false zeroes would arise from diseased individuals not seeking treatment at the central Matlab ICDDR,B hospital, either because of self-treatment with oral rehydration or other methods, or because medical attention was sought elsewhere in Matlab or the surrounding region. We assume that these false zeroes are greatly outnumbered by true zero counts, however, because of the Matlab ICDDR,B hospital's excellent record in treating diarrheal diseases, with free treatment options and provision of transport to sick individuals.
Zero inflated models are comprised of two components which are analyzed separately. The first component is a count model, containing all positive count observations and a proportion of zero counts that can be normally expected under either a Poisson or negative binomial distribution. The second is a binary model, where zero counts are compared to observations with any positive counts. Zero inflated models thus allow us to estimate the effects of multiple variables on both the likelihood of a bari experiencing increased magnitude of disease events (the count model), and the likelihood of a bari experiencing no cholera or shigellosis events (the binary model) [18, 22, 23].
The pscl package in R was used for parameter estimation [22, 24, 25]. A ZINB model was selected due to the overdispersion of aggregated cholera and shigellosis counts with respect to the Poisson distribution. Using the Akaike Information Criterion (AIC) and Log-Likelihood Ratio (LLR) as indications of goodness of fit, the best set of parameters was retained.
Parameter estimates of the cholera geographically weighted global regression model
Distance to hospital
Mean tubewell depth
Comparison of cholera global and local Logistic model diagnostics
Global Cholera Model
Local Cholera Model
Akaike Information Criterion
Parameter estimates of the shigellosis geographically weighted global regression model
Distance to hospital
Mean tubewell depth
Comparison of shigellosis global and local Logistic model diagnostics
Global Shigellosis Model
Local Shigellosis Model
Akaike Information Criterion
Parameter estimates of the cholera zero inflated negative binomial regression model
Distance to hospital
Mean tubewell depth
Parameter estimates of the shigellosis zero inflated negative binomial regression model
Distance to hospital
Mean tubewell depth
For baris experiencing cholera in 2002 to 2004, there was a strongly significant (p < .001) relationship to population and hospital distance. Baris with larger populations during the time period experienced greater numbers of cholera cases. Baris that were far away from the Matlab ICDDR,B hospital experienced lower cholera counts. The number of tubewells that a bari had access to was not a statistically significant predictor of cholera counts nor was the mean tubewell depth. The binary model for cholera shows only two statistically significant relationships in the prediction of zero cholera counts: number of tubewells per bari and the background population. Increased access to tubewells, as measured by the number of tubewells per bari, significantly increases the probability of a bari reporting zero cholera events. The background population of a bari shows a statistically significant negative relationship with zero counts, such that higher bari population decreases the chances of a bari reporting no cholera. Mean tubewell depth, as in the count model, showed no statistically significant association in the binary model.
As seen in Table 7, the same predictor variables were statistically significant in the shigellosis count and binary models as the cholera models. In the count model, background bari population from 2002-2004 is a significant (p < .001) positive predictor of positive shigellosis counts: as population increases so does the number of shigellosis cases experienced by a bari. Distance to the ICDDR,B hospital has the opposite effect on shigellosis case counts: baris located further from the hospital are less likely to have high numbers of reported shigellosis cases (p < .001). Neither the number of tubewells in a bari nor a bari's mean tubewell depth are significant predictors of shigellosis counts.
Only two variables were significant in the shigellosis binary model: the number of tubewells that a bari was able to access and the population of a bari from 2002-2004. As the number of tubewells in a bari increases, the likelihood of reporting zero shigellosis events increases. As population increases, the reporting of zero shigellosis events decreases at the bari level. Distance to the hospital and mean tubewell depth were not significant predictors of whether a bari experienced shigellosis during 2002 to 2004.
Geographically weighted regression models were constructed to explore spatial heterogeneity of relationships to determine whether a stationary statistical model was appropriate for the data. Results from both the cholera and shigellosis models suggest that relationships are stationary. The influence of tubewell density on both cholera and shigellosis counts does not vary across the study area suggesting that water access interventions should apply to the whole of Matlab rather than specific target areas.
Results from both the cholera and shigellosis zero inflated models indicate that, when controlling for other population and environment covariates, the number of tubewells that a bari is able to access is an important predictor of diarrheal disease events. The models control for background bari population size, so regardless of the number of people sharing tubewells, as access to tubewell water increases at the bari level the chances of those baris reporting no cholera or shigellosis events also increases. This relationship holds true despite the presence of a mean tubewell depth variable in the model, which would control for the effect of varying tubewell depths in some baris, or the absence of tubewells completely in one third of the baris measured. It is commonly assumed that deeper tubewells draw less polluted water to the surface than shallow tubewells, which may become contaminated as surface water or other material seeps downwards towards the water table [27–29]. While we did not specifically investigate the relationship between shallow tubewells and diarrheal disease events, it was important to account for this potential interaction with the number of tubewells available to a population. Our findings indicate that when the potential for poor water quality in baris without tubewells or in baris with shallower tubewells is accounted for, the total access to tubewells in a bari is a positive predictor for the absence of diarrheal disease.
The distance between a bari and the Matlab hospital is a significant variable for both cholera and shigellosis in the count models, wherein baris further from the hospital report fewer cases of disease. We interpret these results to mean that there is a distance decay effect in Matlab, such that even in baris with disease events, fewer are reported to the hospital. This could be due to the long travel times between baris located in the further reaches of the study area, or due to residents seeking medical treatment outside the bounds of the study area. Distance to the hospital is not, however, a significant variable in either of the binary models, suggesting that it is not strongly predictive of whether a bari does or does not experience disease. The global GWR models support this assertion.
Our findings demonstrate the importance of access to clean water in driving diarrheal disease events in the developing world: higher densities of tubewells contribute to lower reports of cholera and shigellosis events. Increasing the amount and quality of clean water available to residents of Matlab and other regions of Bangladesh could lower not only the risk of cholera and shigellosis, but also other diarrheal diseases. This study was limited to the reporting of disease events from 2002 to 2004. Since that time, a large number of very deep tubewells (> 700 feet) have been installed in Matlab, and elsewhere in Bangladesh, by a variety of NGOs to mitigate the impacts of arsenic poisoning from shallow tubewells. Such installation could have dramatic effects on both the distribution and intensity of diarrheal disease events. As part of arsenic mitigation efforts, tubewells were labeled as safe or unsafe for drinking. Residents of baris with no safe tubewells have shown a variety of strategies, ranging from travelling to neighboring baris or public tubewells to access safe water, switching usage to only one tubewell in a bari, or ignoring the warnings and still using arsenic infected tubewells . Our future work will examine the durability of these results in the face of changing water access in Matlab, driven both by individual arsenic mitigation efforts (switching tubewells) and by the large-scale installation of even deeper tubewells by the Bangladeshi government and NGOs.
This study was conducted with the support of ICDDR,B and its donors which provide unrestricted support to ICDDR,B for its operations and research. Current donors providing unrestricted support include: the Government of the People's Republic of Bangladesh (GoB), the Canadian International Development Agency (CIDA), Embassy of the Kingdom of the Netherlands (EKN), the Swedish International Development Cooperative Agency (SIDA), and the Department for International Development, UK (DFID). We gratefully acknowledge these donors for their support and commitment to ICDR,B's research efforts. This study was also supported by the National Oceanic and Atmospheric Administration Oceans and Human Health Program and the National Science Foundation (BCS-0924479) and the IGERT program at the Carolina Population Center).
- Islam MS, Siddika A, Khan MNH, Goldar MM, Sadique MA, Kabir A, Huq A, Colwell RR: Microbiological analysis of tube-well water in a rural area of Bangladesh. Appl Environ Microbiol 2001,67(7):3328.PubMedView Article
- Hoque B, Juncker T, Sack R, Ali M, Aziz K: Sustainability of a water, sanitation and hygiene education project in rural Bangladesh: a 5-year follow-up. Bull World Health Organ 1996,74(4):431.PubMed
- Tumwine JK, Thompson J, Katua-Katua M, Mujwajuzi M, Johnstone N, Porras I: Diarrhoea and effects of different water sources, sanitation and hygiene behaviour in East Africa. Trop Med Int Health 2002,7(9):750–756.PubMedView Article
- Esrey SA: Water, waste, and well-being: a multicountry study. Am J Epidemiol 1996,143(6):608.PubMed
- Aziz K, Hoque BA, Hasan KZ, Patwary M, Huttly SRA, Rahaman MM, Feachem R: Reduction in diarrhoeal diseases in children in rural Bangladesh by environmental and behavioural modifications* 1. Trans R Soc Trop Med Hyg 1990,84(3):433–438.PubMedView Article
- Montgomery MA, Elimelech M: Water and sanitation in developing countries: including health in the equation. Environ Sci Technol 2007,41(1):17–24.PubMedView Article
- Hoque BA, Hallman K, Levy J, Bouis H, Ali N, Khan F, Khanam S, Kabir M, Hossain S, Shah Alam M: Rural drinking water at supply and household levels: Quality and management. Int J Hyg Environ Health 2006,209(5):451–460.PubMedView Article
- Ali M, Emch M, Ashley C, Streatfield PK: Implementation of a medical geographic information system: concepts and uses. J Health Popul Nutr 2001,19(2):100–110.PubMed
- Ali M, Emch M, Donnay JP, Yunus M, Sack RB: The spatial epidemiology of cholera in an endemic area of Bangladesh. Soc Sci Med 2002,55(6):1015–1024.PubMedView Article
- Ali M, Emch M, Donnay JP, Yunus M, Sack RB: Identifying environmental risk factors for endemic cholera: a raster GIS approach. Health Place 2002,8(3):201–210.PubMedView Article
- Carrel M, Voss P, Streatfield PK, Yunus M, Emch M: Protection from annual flooding is correlated with increased cholera prevalence in Bangladesh: a zero-inflated regression analysis. Environ Health 2010, 9:13.PubMedView Article
- Emch M: Diarrheal disease risk in Matlab, Bangladesh. Soc Sci Med 1999,49(4):519–530.PubMedView Article
- Emch M, Ali M, Yunus M: Risk areas and neighborhood-level risk factors for Shigella dysenteriae 1 and Shigella flexneri. Health Place 2008,14(1):96–105.PubMedView Article
- Brunsdon C, Fotheringham S, Charlton M: Geographically Weighted Regression-Modelling Spatial Non-Stationarity. The Statistician 1998,47(3):431–443.
- Fotheringham AS, Charlton ME, Brunsdon C: Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis. Environ Plann A 1998, 30:1905–1927.View Article
- Cameron AC, Trivedi PK: Econometric models based on count data: comparisons and applications of some estimators and tests. J Appl Econometrics 1986, 1:29–53.View Article
- Heilbron DC: Zero-altered and other regression models for count data with added zeros. Biometrical Journal 1994,36(5):531–547.View Article
- Lambert D: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34:1–14.View Article
- Ridout M, Demetrio CGB, Hinde J: Models for count data with many zeros. Proceedings of the Proceedings of the XIXth International Biometric Conference 1998, 19:179–192.
- Böhning D, Dietz E, Schlattmann P, Mendonça L, Kirchner U: The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society: Series A (Statistics in Society) 1999,162(2):195–209.View Article
- Cheung YB: Zero-inflated models for regression analysis of count data: a study of growth and development. Stat Med 2002,21(10):1461–1469.PubMedView Article
- Zeileis A, Kleiber C, Jackman S: Regression Models for Count Data in R. Journal of Statistical Software 2008,27(8):1–25.
- Cameron AC, Trivedi PK: Regression analysis of count data. Cambridge; New York: Cambridge University Press; 1998.
- R Development Core Team: R: A language and environment for statistical computing. 2011.
- Jackman S: pscl: Classes and methods for R developed in the Political Science Computational Laboratory, Stanford University. 2008.
- Fotheringham AS, Brunsdon C, Charlton M: Geographically weighted regression: the analysis of spatially varying relationships. Chichester: Wiley; 2002.
- Howard G, Pedley S, Barrett M, Nalubega M, Johal K: Risk factors contributing to microbiological contamination of shallow groundwater in Kampala, Uganda. Water Res 2003,37(14):3421–3429.PubMedView Article
- Morris BL, Lawrence ARL, Chilton P, Adams B, Calow RC, Klinck BA: Groundwater and its susceptibility to degradation: a global assessment of the problem and options for management. 2003, 03–3.
- Zingoni E, Love D, Magadza C, Moyce W, Musiwa K: Effects of a semi-formal urban settlement on groundwater quality:: Epworth (Zimbabwe): Case study and groundwater quality zoning. Physics and Chemistry of the Earth, Parts A/B/C 2005,30(11–16):680–688.View Article
- Madajewicz M, Pfaff A, Van Geen A, Graziano J, Hussein I, Momotaj H, Sylvi R, Ahsan H: Can information alone change behavior? Response to arsenic contamination of groundwater in Bangladesh. J Dev Econ 2007,84(2):731–754.View Article
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.