Automated surveillance of 911 call data for detection of possible water contamination incidents
© Haas et al; licensee BioMed Central Ltd. 2011
Received: 23 November 2010
Accepted: 30 March 2011
Published: 30 March 2011
Drinking water contamination, with the capability to affect large populations, poses a significant risk to public health. In recent water contamination events, the impact of contamination on public health appeared in data streams monitoring health-seeking behavior. While public health surveillance has traditionally focused on the detection of pathogens, developing methods for detection of illness from fast-acting chemicals has not been an emphasis.
An automated surveillance system was implemented for Cincinnati's drinking water contamination warning system to monitor health-related 911 calls in the city of Cincinnati. Incident codes indicative of possible water contamination were filtered from all 911 calls for analysis. The 911 surveillance system uses a space-time scan statistic to detect potential water contamination incidents. The frequency and characteristics of the 911 alarms over a 2.5 year period were studied.
During the evaluation, 85 alarms occurred, although most occurred prior to the implementation of an additional alerting constraint in May 2009. Data were available for analysis approximately 48 minutes after calls indicating alarms may be generated 1-2 hours after a rapid increase in call volume. Most alerts occurred in areas of high population density. The average alarm area was 9.22 square kilometers. The average number of cases in an alarm was nine calls.
The 911 surveillance system provides timely notification of possible public health events, but did have limitations. While the alarms contained incident codes and location of the caller, additional information such as medical status was not available to assist validating the cause of the alarm. Furthermore, users indicated that a better understanding of 911 system functionality is necessary to understand how it would behave in an actual water contamination event.
Drinking water contamination incidents can pose a significant public health risk when they are not detected in time to enact measures to reduce exposures and mitigate the spread of contaminated water in a utility's distribution system [1, 2]. Several documented cases of water contamination incidents have concluded that the monitoring of health-seeking actions pursued by the general public may have allowed for earlier detection of contaminated water. Automated surveillance and astute clinician disease reporting are techniques that can monitor the health-seeking behaviors. This article illustrates examples of automated surveillance for minimizing the consequences during a contamination incident.
The Salmonella outbreak in Alamosa County began with a case report to the Alamosa Public Health Nursing Service on March 6, 2008, leading to the initiation of a preliminary investigation. By March 14, the number of cases reported had increased to 19. The epidemiological investigation did not find a common food source for all cases, although several had eaten at a high volume restaurant where the index case had worked. The critical piece of information was that of the five infants infected, all had been fed with powdered formula mixed with tap water and had no other common exposure. On March 19, 2008, drinking water samples were tested for total coliform bacteria through the use of a quick screening test; the results were positive. Though this test was not confirmatory, city officials decided to issue a bottled-water order at this stage of the incident. Beginning with the index case and continuing through April 29, 436 cases from this outbreak were documented. Based on a multiplier used by the Centers for Disease Control and Prevention (CDC) which indicates that for every case that presents for care, another 30 cases are likely to have occurred, an estimated 12,000 people were affected during this incident . Automated surveillance may have resulted in sampling commencing sooner than five days after the first documented cases and consequences may have been less severe for Alamosa.
Another instance where automated surveillance may have provided an early warning of a water contamination incident occurred during a Cryptosporidium outbreak in Milwaukee, WI. In this case, a nursing hotline began to receive a dramatic increase in calls for cases of diarrhea on April 2, 1993 and the local emergency department had a peak of patients with similar symptoms on April 4th . Five days elapsed before a boil water advisory was issued on April 7, 1993; over 400,000 people were affected . A retrospective study of two waterborne outbreaks involving Cryptosporidium, E. coli O157:H7, and Campylobacter in Canada indicated syndromic surveillance of over-the-counter medication sales would have provided an early indicator of contamination . Close monitoring of developing public health incidents through a partnership between health agencies and the water utility can serve to reduce the timeline for determining whether contaminated water is the source of an incident and could prevent additional exposures to the contaminated water .
To address the risk of contamination of drinking water systems, the U.S. Environmental Protection Agency (EPA) began conceptual design of a multi-component contamination warning system in 2004, which culminated in the development of the WaterSentinel System Architecture . Cincinnati, Ohio was chosen as the first city to demonstrate this conceptual design and deployment of the Cincinnati contamination warning system was substantially complete in December 2007. The public health surveillance component was one of the primary monitoring and surveillance components implemented under the pilot program. A partnership was formed between the Greater Cincinnati Water Works and key representatives of several local public health agencies, including the Cincinnati Health Department, Hamilton County Public Health, the Cincinnati Fire Department, and the Drug and Poison Information Center. Personnel supporting the component were responsible for conducting investigations of public health surveillance alarms produced by automated surveillance systems, and for participating in drills and exercises which simulated drinking water contamination incidents. Additional details on the design of the Cincinnati contamination warning system can be found in Water Security Initiative: Cincinnati Pilot Post-Implementation System Status .
During a water contamination incident, public health officials may expect that health-seeking behavior of individuals exposed to contaminated water would largely depend on the timing of symptom onset, severity of symptoms, and nature of the symptoms. These factors are a function of the type of contaminant introduced into the water supply, and are anticipated to have a significant impact on how the public would react to an incident. Some of the expected symptoms following exposure to a contaminant may include respiratory or gastrointestinal distress, skin irritation, or neurological symptoms. The following list includes a variety of actions that may be pursued by individuals seeking medical treatment for systems developed during an incident:
• Call 911
• Call a medical help line (e.g., nursing hotline or Poison Control Center)
• Drive to an emergency department
• Schedule an appointment with a primary care physician
• Purchase over-the-counter medication
Automated surveillance systems which are designed to monitor any of these public health data streams can provide an early warning when an established baseline threshold has been exceeded, signaling a possible contamination incident. To provide robust coverage, a public health surveillance component designed to monitor a variety of public health data streams would be capable of detecting both acute and gradually developing events. Acute events result from fast-acting contaminants that generally have an onset of symptoms after exposure within ten minutes to three hours, while one to several days may elapse between exposure to a contaminant and the onset of symptoms for gradually developing events. The term fast-acting contaminant describes any possible water contaminant that produces rapid onset of symptoms, within minutes to hours following exposure to an acutely harmful dose. Since, for purposes of the contamination warning system, the syndromes monitored are the same for both acute and gradually developing events caused by biological, radiological, or chemical contaminants, the timeliness of anomaly detection coupled with subject matter expertise during subsequent alert investigations determines what type of contaminant(s) may be responsible.
Prior to building the public health surveillance component of the Cincinnati pilot, a gap analysis was conducted to determine whether existing surveillance capabilities provided sufficient coverage for the spectrum of possible water contaminants. One automated surveillance tool, Real-time Outbreak and Disease Surveillance, was used by the local health departments to monitor emergency department visits. Real-time Outbreak and Disease Surveillance was designed to detect gradually worsening public health conditions, such as the onset of influenza season or a Cryptosporidium outbreak and was later replaced by EpiCenter. While emergency department data surveillance is programmed to execute on an hourly basis, alerts indicating anomalies in patient case load will only occur once per day (when anomalous conditions are present). Data entry specialists typically upload information from paper records of hospital cases from the previous 24 hours once per day; therefore, new electronic records often are not available for analysis more than once per 24 hour period. For this reason, EpiCenter was not optimal for detecting water contamination (either unintentional or intentional) with a fast-acting contaminant. For water contamination scenarios caused by fast-acting contaminants, community exposure would occur quickly, and symptoms would present rapidly; in most cases, less than 24 hours. Therefore, an automated surveillance system, capable of identifying possible contamination incidents due to fast-acting contaminants, (in a more timely fashion than EpiCenter could provide) was needed. One of the new systems implemented consisted of surveillance of 911 calls within the City of Cincinnati.
911 data offers several advantages over emergency department visit data which is commonly used by health departments for surveillance of possible public health outbreaks. Data is often more complete since the source provider consists of a few or even one dispatch center, rather than many hospitals. 911 data is available quickly, as the focus is to dispatch emergency services rather than to diagnose a patient. Additionally, geographic data provides the exact location of the illness, rather than the location of the hospital to which the patient was transported, which could service a large geographic area . Besides improving spatial detection accuracy [12, 13], precise geographic location information is critical to the response to suspected or confirmed water contamination. Water utility personnel involved in consequence management during response to an incident can use the location data of exposed individuals to identify the extent of the contaminated area and to trace the source of the contamination. This paper discusses the design, implementation, and application of automated, hourly 911 data surveillance for detection of possible water contamination in the Cincinnati contamination warning system.
Data Flow and Configuration
Cincinnati Police Department and Cincinnati Fire Department emergency dispatchers process 911 calls on a regular basis through Cincinnati's Motorola dispatch system. To improve standardization and electronic call tracking for 911 calls, a commercial software package, Priority Dispatch ProQA, was deployed in Cincinnati . This software package assists 911 dispatchers in effective triage of calls by gathering a variety of health data in a systematic manner. The software directs dispatchers to assign incident codes to incoming calls according to the Medical Priority Dispatch System, which is a standard approved by the National Association of Emergency Dispatchers and utilized by over 3,400 dispatch centers in the US. Calls are triaged according to data elements, including chief complaint and geographic location (latitude/longitude) of the call. The caller initiates the complaint (i.e., breathing problems) and more specific information is attained by the dispatcher using prompts provided by the software program. In addition, instructions are provided to the caller until medical assistance arrives.
Data Analysis with the Space-time Permutation Scan
The space-time permutation scan statistic was introduced by Kulldorff and colleagues for early detection of disease outbreaks without requiring knowledge of the population at risk . The statistic relies entirely on case location and time data. Essentially, the method extends scan statistics to the temporal dimension. The method iteratively moves a cylindrical scan window across the study area, varying the cylinder size spatially and temporally. At each iteration, the Poisson generalized likelihood ratio that the cylinder contains an outbreak is evaluated. Over all iterations, the cylinder with the highest likelihood ratio constitutes the cylinder most likely to indicate outbreak. The highest likelihood ratio does not necessarily mean an outbreak has occurred. Statistical inference on the probability that the cylinder is the result of an outbreak relies on Monte Carlo hypothesis testing to account for multiple testing issues.
For the Cincinnati 911 surveillance system, the SaTScan™ software package was utilized to perform space-time permutation scans of filtered 911 calls in the City of Cincinnati. SaTScan™ is a free software package for spatial, temporal, and spatio-temporal analysis of data . Prior to implementation of the 911 surveillance system, 911 data from the coverage area was analyzed to establish appropriate system parameters. Statistical analysis of historical data supported the use of a 21-day data set for the space-time permutation model. The space-time permutation model requires older data to evaluate the validity of the detected cluster. The more data involved in the analysis, the more confidence (i.e., lower p-value) there is in the existence of the cluster. At the same time, too much data may also result in lower confidence in the cluster's existence due to a recurrence of earlier dates experiencing clustering in the location .
To determine the most appropriate amount of historical data to include in the analysis, SaTScan™ compared data sets consisting of different lengths of historical data (four weeks, three weeks, two weeks, one week, six days, five days, two days). The dates chosen for the analysis were the three separate dates that had a cluster in the historical analysis and also generated an alert based on a parallel analysis of EMS data using the CDC's Early Aberration Reporting System . Based upon the results of the three analyses (each of which compared the seven time periods of historical data described above), the most appropriate amount of historical data for space-time permutation was either three weeks or two weeks.
The three-week and two-week data sets both detected the targeted clusters with small p-values and high recurrence intervals. The recurrence interval indicates how often the detected cluster may occur by chance. If the recurrence interval is three months, then the expected number of false positives is one every three months. The three-week analyses had slightly lower p-values than the two-week analyses, and therefore was the recommended length of historical data to include in order to balance statistical significance with the recurrence interval.
For the public health surveillance component of the Cincinnati pilot, analysis of 911 data is executed hourly and the algorithm executes on a rolling three-week data set of 911 call detail records each analysis cycle. The analysis results provide the location and size of likely event clusters across the entire data set, sorted by the p-value. The p-value is evaluated over 999 Monte Carlo simulations.
If the space-time permutation scan identifies a candidate cluster with p-value < 0.0250 for a given day
If previous analyses have not already generated an alert for the exact cluster center identifier (911 call identifier closest to cluster center) for the given day
If the analysis does not measure the candidate cluster center point as being within the boundary of any previously alerted cluster(s) for a given day (distance from candidate alert-worthy cluster center to previously-alerted cluster center(s) is less than said previously-alerted cluster's radius)
If the event count (number of 911 calls) associated with the candidate cluster is > 16
When an analysis of 911 call details demonstrates that the data meet the alarm criteria, an email is transmitted by the 911 surveillance system to local public health epidemiologists/disease investigators. The email notification contains the cluster details such as the p-value, radius, centroid, number of cases in the cluster, and the number of cases expected in the scan window. Additionally, incident codes, HIPAA-compliant age, gender, and dispatch address (when available) for each case are provided for assessment of similarities by investigators. In the event of an alarm, a set procedure as outlined in the Water Security Initiative: Interim Guidance on Developing an Operational Strategy for Contamination Warning Systems is implemented for alarm review and verification of whether a water contamination incident is possible .
Public Health Surveillance User Interface
Upon receiving the alarm email, the investigator will log onto a Public Health Surveillance User Interface to evaluate the cluster. The User Interface provides Web access to the analysis details for the both the 911 and EMS surveillance systems. The 911 analysis details provided in the alarm email are included on the home page, along with a history of 911 alarms. While location information is provided in 911 alarm notification emails, local public health partners requested the ability to visualize the data in a geographic information system (GIS) tool. Therefore, a modification was introduced which allows system users to view alarm details by clicking on a link within the User Interface which provides a GIS display. Currently, the bounding circle of the alarms in addition to locations and select information of the cases generating the alarms are exported in KML and displayed in Google Earth - a free GIS platform.
Upon receipt of a 911 alarm, the public health investigator will confirm proper coding of underlying data and check other public health surveillance systems for any corresponding trends. At this point, the investigator may be able to determine that the alarm is not a valid contamination event and close the investigation. If possible contamination is still suspected, the investigator will then contact the Water Utility Emergency Response Manager, and convene a conference call with other public health officials and the Drug and Poison Information Center. This team will then discuss the data associated with the alarm and the water utility will conduct an internal investigation and report to the public health team while following the utility's Consequence Management Plan.
Performance data for the 911 surveillance system was collected from January 2008 through June 2010. Data transmission times are provided in the analysis below, as well as alarm characteristics, such as frequency of alarms, number of cases in an alarm, distribution of alarms, and alarm area.
Occasional (seven) long delays (> 24 hours) in data transmission were caused by network outages, which caused downtime of the interface that transmits call records from the Cincinnati Fire Department server to the utility application server. Until this interface is manually restarted, data transmission cannot occur. Specifically, one notably long period of interface downtime (~9 days) occurred between November 25, 2008 and December 4, 2008, which was the result of network instability. During this time period, transmission of all records from the Cincinnati Fire Department server to the utility application server was impeded. This event noticeably increased the average transmission time for the November 16, 2008 reporting period.
During two reporting periods later in the evaluation timeline, longer data transmission times also occurred. In the February 16, 2010 reporting period, the 911 interface experienced a seven-day outage which delayed data transmission. Later, in the April 16, 2010 reporting period, the utility's 911 subscription expired, which caused a five-day delay in data transmissions between May 1, 2010, and May 6, 2010, when the issue was resolved.
Frequency of Alarms (pre/post new alerting logic step implementation)
To date, there have been no true alarms generated by the 911 surveillance system. All alarms generated by the 911 surveillance system as during the evaluation period are considered false alarms, meaning that they did not indicate possible water contamination or other public health events.
Initially, the only limiting condition on alarm notifications was a maximum p-value of 0.025. This resulted in detection of many statistically significant anomalies that were of little concern to public health officials because there were so few cases in most alarms. The SaTScan™ configuration, combined with the first three alerting logic steps, resulted in an average of over five alarms per month between January 2008 and May 2009. This alarm rate was determined to be too frequent by the public health partners; therefore, the alarm history was examined to evaluate possible additional alerting logic that could be applied the 911 alarm notifications. Approximately 14 months of data was reviewed retrospectively to analyze factors such as p-value of the alert clusters, cluster radius per 911 alert, and number of 911 calls per alert. Since a significant number of the alarms contained very few cases (i.e., low number of 911 calls) and the public health partners were most interested in large events, the fourth alerting logic step was implemented in May 2009. Historical data analysis of alarms received indicated requiring a minimum of 17 calls for notification would have reduced the number of alarms received from 85 to 7 prior to implementation, which was acceptable to the public health partners.
Number of Calls in an Alarm
Spatial Coverage of Alarms
Analysis of the geographic area covered by the 911 surveillance system was conducted to examine the distribution of 911 alerts within the City of Cincinnati. The analysis of empirical data contains a map of the surveillance area with an overlay of 911 alarms that occurred during the evaluation period.
Spatial Extent of Alarms
Analysis of 911 alarms produced during the evaluation period demonstrates that most alarms were generated when a few calls co-occurred within close proximity in densely populated areas, or when many calls occurred over larger geographic areas where fewer calls were expected. Alarms with lower call density covered a larger alarm area, and those with higher call density covered a smaller alarm area.
Statistical Analysis of Spatial Extent of 911 Alarms (n = 86)
Alarm Area (km2)
Number of Calls
In general, the public health partners indicated that the 911 surveillance system provided timely alerts and the use of a standardized software system by 911 call operators to enter 911 call data allowed for consistent coding. The speed of information (alarms containing location data) from 911 surveillance may be valuable for detection of fast-acting contaminants, but participants suggested that a better understanding of the functionality of the 911 surveillance system would be necessary to assess how the system may perform during an actual contamination incident. An increase in pre-implementation training was cited as a major factor that would have been beneficial to the initial process of investigating 911 alerts. One weakness identified by the system users was the limited amount of medical information available in the 911 alarm details. Though the incident code for each call is available, and location data demonstrates the degree of clustering among the cases, there is limited additional caller information (i.e., medical status) for investigators to consider when determining whether an alarm suggests possible water contamination. Despite this observation, additional surveillance systems utilized within the public health surveillance component may provide supplemental information that could further inform investigations of possible water contamination incidents. Furthermore, the location information included with the alarm is sufficient to guide the investigation.
During the evaluation, no water contamination was detected by this system or the other surveillance methods employed by the health departments which would directly assess how well the system met the design objective to detect fast-acting contaminants. However, participants in this monitoring program indicated even though an actual contamination event has not yet occurred which would test the ability of the surveillance system to detect an event, the speed of information from 911 surveillance is valuable for detection of fast-acting contaminants.
We acknowledge the Greater Cincinnati Water Works and Cincinnati Fire Department for their role in granting access to the data used for this study, and the Cincinnati Health Department, Hamilton County Public Health, Cincinnati Drug and Poison Information Center, and the Federal Bureau of Investigations for their contributions and support during this study.
- U.S. Government Accountability Office: Drinking Water: Experts' Views on How Federal Funding Can Best Be Spent To Improve Security. Testimony Before the Subcommittee on Environment and Hazardous Materials, Committee on Energy and Commerce, House of Representatives, September 30. 2004Google Scholar
- Matalas N: Vulnerability of Water Systems to Acts of Terrorism and Acts of Nature. Risk-based Decision Making in Water Resources X: Proceedings of the Tenth Conference, November 3-8. 2002, American Society of Civil EngineersGoogle Scholar
- Berg R: The Alamosa Salmonella Outbreak: A Gumshoe Investigation. Journal of Environmental Health. 2008, 71 (2): 54-55.PubMedGoogle Scholar
- Proctor ME, Blair KA, Davis JP: Surveillance data for waterborne illness detection: an assessment following a massive waterborne outbreak of Cryptosporidium infection. Epidemiol Infect. 1998, 43-54. 10.1017/S0950268897008327. 120:Google Scholar
- Griffin RJ: Public Reliance on Risk Communication Channels in the Wake of a Cryptosporidium Outbreak. Risk Analysis. 1998, 18 (4): 367-375. 10.1111/j.1539-6924.1998.tb00350.x.View ArticlePubMedGoogle Scholar
- Edge VL, Pollari F, Lim G, Aramini J, Sockett P, Martin SW, Wilson J, Ellis A: Syndromic surveillance of gastrointestinal illness using pharmacy over-the-counter sales. A retrospective study of waterborne outbreaks in Saskatchewan and Ontario. Can J Public Health. 2004, 95 (6): 446-450.PubMedGoogle Scholar
- Risebro HL, PR Hunter: Surveillance of waterborne disease in European member states: a qualitative study. Journal of Water and Health. 2007, 5: 19-38. 10.2166/wh.2007.135.View ArticlePubMedGoogle Scholar
- U.S. Environmental Protection Agency: WaterSentinel System Architecture, Draft for Science Advisory Board Review. EPA 817-D-05-003. 2005Google Scholar
- U.S. Environmental Protection Agency: Water Security Initiative: Cincinnati Pilot Post-Implementation System Status. EPA 817-R-08-004. 2008Google Scholar
- Bronstein AC, Spyker DA, Cantilena LR, Green JL, Rumak BH, Giffin SL: 2009 Annual Report of the American Association of Poison Control Centers' National Poison Data System (NPDS): 27th Annual Report. Clinical Toxicology. 2010, 48: 10-10.3109/15563650.2010.543906.View ArticleGoogle Scholar
- Garza A: Real Time EMS Events as Surrogate Events in Syndromic Surveillance. Advances in Disease Surveillance. 2007, 4: 7-Google Scholar
- Savory DJ: Enhancing spatial detection accuracy for syndromic surveillance with street level incidence data. International Journal of Health Geographics. 2010, 9: 1-10.1186/1476-072X-9-1.PubMed CentralView ArticlePubMedGoogle Scholar
- Olson KL, Grannis SJ, Mandl KD: Privacy protection versus cluster detection in spatial epidemiology. Am J Public Health. 2006, 96 (11): 2002-2008. 10.2105/AJPH.2005.069526.PubMed CentralView ArticlePubMedGoogle Scholar
- Priority Dispatch Corporationhttp://www.prioritydispatch.nethttp://www.prioritydispatch.net
- Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F: A Space-Time Permutation Scan Statistic for Disease Outbreak Detection. PLoS Medicine. 2005, 2 (3): e59-10.1371/journal.pmed.0020059.PubMed CentralView ArticlePubMedGoogle Scholar
- Kleinman K, Lazarus R, Platt R: A generalized linear mixed models approach for detecting incident clusters of disease in small areas, with an application to biological terrorism. American Journal of Epidemiology. 2004, 159: 217-24. 10.1093/aje/kwh029.View ArticlePubMedGoogle Scholar
- Early Aberration Reporting System.http://www.bt.cdc.gov/surveillance/earshttp://www.bt.cdc.gov/surveillance/ears
- U.S. Environmental Protection Agency: Water Security Initiative: Interim Guidance on Developing an Operational Strategy for Contamination Warning Systems. EPA 817-R-08-002. 2008Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.