Initiating informatics and GIS support for a field investigation of Bioterrorism: The New Jersey anthrax experience

Background The investigation of potential exposure to anthrax spores in a Trenton, New Jersey, mail-processing facility required rapid assessment of informatics needs and adaptation of existing informatics tools to new physical and information-processing environments. Because the affected building and its computers were closed down, data to list potentially exposed persons and map building floor plans were unavailable from the primary source. Results Controlling the effects of anthrax contamination required identification and follow-up of potentially exposed persons. Risk of exposure had to be estimated from the geographic relationship between work history and environmental sample sites within the contaminated facility. To assist in establishing geographic relationships, floor plan maps of the postal facility were constructed in ArcView Geographic Information System (GIS) software and linked to a database of personnel and visitors using Epi Info and Epi Map 2000. A repository for maintaining the latest versions of various documents was set up using Web page hyperlinks. Conclusions During public health emergencies, such as bioterrorist attacks and disease epidemics, computerized information systems for data management, analysis, and communication may be needed within hours of beginning the investigation. Available sources of data and output requirements of the system may be changed frequently during the course of the investigation. Integrating data from a variety of sources may require entering or importing data from a variety of digital and paper formats. Spatial representation of data is particularly valuable for assessing environmental exposure. Written documents, guidelines, and memos important to the epidemic were frequently revised. In this investigation, a database was operational on the second day and the GIS component during the second week of the investigation.


Results:
Controlling the effects of anthrax contamination required identification and follow-up of potentially exposed persons. Risk of exposure had to be estimated from the geographic relationship between work history and environmental sample sites within the contaminated facility. To assist in establishing geographic relationships, floor plan maps of the postal facility were constructed in ArcView Geographic Information System (GIS) software and linked to a database of personnel and visitors using Epi Info and Epi Map 2000. A repository for maintaining the latest versions of various documents was set up using Web page hyperlinks.

Conclusions:
During public health emergencies, such as bioterrorist attacks and disease epidemics, computerized information systems for data management, analysis, and communication may be needed within hours of beginning the investigation. Available sources of data and output requirements of the system may be changed frequently during the course of the investigation. Integrating data from a variety of sources may require entering or importing data from a variety of digital and paper formats. Spatial representation of data is particularly valuable for assessing environmental exposure. Written documents, guidelines, and memos important to the epidemic were frequently revised. In this investigation, a database was operational on the second day and the GIS component during the second week of the investigation.

Background
In September 2001 Anthrax spores were intentionally sent through the US Postal System. At least four letters contain-ing spores of Bacillus anthracis were sent to a U.S. Senator's office and to media centers in different parts of the country. Two cases reported in Florida were followed by two cases in New York. A few days later, four more cases occurred among postal workers at the Brentwood mail processing facility in the Washington area. [1] The appearance of new cases in different locations around the country suggested contamination with B. anthracis of several postal facilities. In each location, the population potentially exposed was large. Public health, criminal investigation, and postal authorities at local, state and federal levels investigated the suspected bioterrorism event. Within hours of the discovery of clusters of cases, the Centers for Disease Control and Prevention(CDC) deployed teams to four different locations to support state and local investigations.
Four letters containing anthrax spores were known to have been postmarked and processed in the Hamilton Mail Processing Center (HMPC) located in Hamilton Township, a suburb of Trenton, New Jersey. The HMPC is part of a complicated network of stations and buildings designed to move millions of pieces of mail per day. It is not a typical post office, but a very large facility located in a 281,387 square foot building with many operations areas containing mechanized equipment. It was typically staffed by 250 employees per shift and visited by numerous others in the course of processing 2 million items per day.
On October 19,,,,, field investigation in Trenton was initiated by a CDC team composed of an epidemiologist, environmental experts, occupational physicians, Epidemic Intelligence Service (EIS) officers, a Public Health Informatics Fellow, and support personnel. The team was headquartered in the New Jersey Department of Health and Senior Services (NJDHSS) and had access to the Local Area Network and computing environment of the Department. Other CDC investigation teams were sent to Florida, Washington, and New York.
A detailed account of the New Jersey epidemiologic investigation has recently been published [2]. This article describes the experience of the Informatics Fellow and the New Jersey Geographic Information System(GIS) Coordinator during the first two weeks of the investigation in assessing information system needs and organizing onsite computer support for the New Jersey arm of the investigation.

Assessment of Information Needs
Early in the investigation, the number of people exposed to anthrax was estimated to be large. The response team planned to sample for anthrax contamination in the facility; identify the exposed population; locate, culture, and give prophylactic antibiotics to those most likely to be exposed; and monitor the exposed and treated persons over time for potential long-term effects. Information on each person and sample was to be obtained, stored, analyzed, and augmented over time. For epidemiologic investigation and management purposes, summary reports had to be generated frequently. Because of the large number of agencies involved and the rapid progress of events, a system of document management for the many drafts of guidelines and other communications was also needed.
Immediate needs for output included lists of employees, clients, visitors, and other potentially exposed persons. The lists were used by EIS officers for individual follow-up and were linked to the map of floor plans of the postal facility and environmental sampling results to help identify persons with the greatest likelihood of exposure. Investigators and managers needed summary information, including the percentage of exposed patients receiving medication and the distribution of exposed persons by county. The media relations' officer of the team produced reports as often as twice a day for the public. Table  1 contains a list of information needs.
Not all potential data sources were available during the investigation. The postal facility was closed until an environmental assessment could be completed. The database system in the building was unavailable, as the computers had been turned off due to concern that fans in the computers might create air currents that would jeopardize safety or alter environmental sampling results. The list of employees in the facility computers was therefore unavailable to the investigators and had to be assembled from other sources. Digital AutoCAD drawing files of the facility were also stored on a workstation within the facility, and were likewise unavailable -a situation that complicated the production of accurate and representative floorplan maps.
Laboratory results included cultures of both human and environmental samples performed by the New Jersey State laboratory and CDC laboratories. Managing reports from samples processed in different laboratories required linking of multiple unique identifiers. Later a tracking system for samples processed in more than one laboratory was developed.
Electronic documents that were produced or revised during the investigation included survey results, background documents, guidelines, antibiotic stockpile logistics, and press releases. Frequent revisions made it necessary to identify and provide access to the latest documents and to track document revisions.  • Spatial relationship between cases and exposed population • Identification of unexposed populations • Identification of exposed population not included during the initial assessment • Distribution of cases • Distribution of deaths • Geographic distribution patterns Guidelines Guidelines for treatments • Frequently asked questions(FAQ) for the emergency response center • FAQ for the press System Implementation The information system developed consisted of three main components: a database of possibly exposed persons, a GIS of the physical plant and equipment, and a document repository. The exposed-person database required merging of files in multiple formats, manual data entry, rapid analysis, and frequent production of reports. GIS work was done using floor plans produced in ArcView GIS 3.2a (Environmental Systems Research Institute, Inc., Redlands, CA http://www.esri.com) by screen-digitizing scanned not-to-scale xerographic copies of AutoCAD drawings of the postal building. Documents were made available to the team by using Web pages (HTML) on the New Jersey Department of Health intranet to link to the latest copies of the files.
Building outlines, walls, work areas, equipment, furniture, the ventilation system and diffusers, and environmental sampling locations were digitized and stored as shapefiles, a format used by both the ArcView GIS and Epi Map software. The geographical features were then linked to environmental sampling data to specific locations in the building using Epi Map [3]. A unique sample identification number was used as the key for linking the shapefile to the environmental sample database ( Figure 1). As the map creation process progressed over the 2-week period, additional, larger size paper drawings were provided to which a complete floor plan and heating, ventilation, and air conditioning systems components could be added. Another version of the floor plan has been published and is available on the Internet [2].
To support investigations in other facilities where the letters were manipulated, digitized floor plans of the Carteret and West Trenton Post Offices were produced from rudimentary sketches. Since Epi Map is public domain software, it was installed on a number of computers without need for licensing making a GIS tool freely available to the investigators.
The document repository was created using HTML on an intranet site inside the firewall of the NJDHSS. The Web page included up-to-date information such as press releases and a list of activities for the day. It was updated twice a day or more often when important information had to be communicated to the rest of the investigation team. The repository page also included hyperlinks to the most recent version of important documents. Each link was checked on a daily basis to ensure that the correct document was being accessed. The page also provided a list of databases available to team members.

Discussion
Recent publications on informatics and bioterrorism have focused on detection of bioterrorist events [4,5]. Some have reviewed the informatics contribution to past emergencies and formulated proposals for various kinds of surveillance and/or response [6][7][8][9][10]. The goal of this paper is to present needs, difficulties, and solutions encountered in the first 2 weeks of a single field investigation. Every epidemic investigation is unique, but a comprehensive review of activities carried out during an actual investigation may focus discussion on understanding and improving the informatics response in such events.  At the beginning of this investigation, significant effort was expended to identify the information needed and sources for that information. Although the NJDHSS provided a secure computer environment in which to work, many elements of the system had to be designed and set up during the investigation. Much of the base information did not exist (e.g., GIS of the Hamilton facility) or had been made unavailable by the event under investigation (e.g., Hamilton facility AutoCAD digital drawing files and employee lists). As in most field investigations, both data sources and hypotheses shifted and evolved over the course of the investigation.
During an investigation, information is critical for decision-making processes and for detection of cases, identification of risk factors, and managing prevention and control measures. In the first hours or days, the emergency response team is required to design and implement an information system capable of providing reliable, timely information. Being prepared means that much of this is already in place. Experience in developing such "emergency information systems" and with the software being used plays an important role in determining the time needed to complete the task.
In this event, close collaboration between the epidemiologist and informaticians was supported by software designed for field investigation and for exchange of data with other commercial products. The presence of trained professionals from a GIS center was an important element in being able to map the site of the investigation. Use of industry standard shapefiles and Microsoft Access data-bases allowed working copies to be distributed and used in public domain software as needed. A database system was operational within two days of the beginning of operations, and the GIS elements evolved over a two-week period as floor plans became available.
Potential problems identified initially included lags in entry of clinical data, lack of access to data sources because computer systems were disrupted or inaccessible, and the variety of formats and platforms that were designed for other purposes. At least one source produced data in different formats on different days.
In field investigation, both physical and informatics environments must be dealt with as they are encountered. The informatics environment is defined by the agencies, networks, and individual computers from which data are obtained and with which exchange of data occurs. Unless the entire investigation is internal to a single agency or company, each of the partners -state, county, health care facility, law enforcement agency, industry, or other entity -has its own informatics environment. The standards most likely to be common to all the partners are those of the computer industry, such as the Internet, Microsoft Windows, and popular GIS standards. Those conducting the investigation must be prepared to adapt to the standards used by the data source partners rather than the reverse.
In this investigation, existing database systems provided useful information, but the data items were available on paper, in Excel files, in text files, in Access files, and in ArcView shapefiles. The informatician is responsible for merging and integrating elements stored in different database formats to make the information available to the emergency response team. The need for a speedy public health response is a challenge for informaticians because the steps that are typically time consuming, such as requirements generation, analysis, design, implementation, and testing of a new system, must be completed in hours rather than in weeks or months. In this experience, some parts of the system were functional within two days after arrival at the investigation site.
Epi Info 2000 http://www.cdc.gov/epiinfo was used to set up databases for manual data entry and updating and for importing and merging data provided by collaborating sources in a variety of Microsoft Windows formats. The system developed took advantage of several features available in Epi Info [3] through its use of commercial component software and computer industry standards: • Rapid development of a relational database, in which identifying information was localized to a single table for security purposes • Importation of data from several of the 20 different file formats that can be read in Epi Info 2000 • The ability to perform data management interactively but to preserve and replay the steps through automatically generated programs or scripts • The ability to link and display Microsoft Access data with ArcView shapefiles to create maps • Rapid development, with minimal coding, simplifying the testing and debugging process • The ability to add or delete variables from a database merely by revising a form on the screen.
• The availability of analytic output and epidemiologic statistics in the same program used for data management

• Interaction with high end GIS programs and transition of database management to a commercial program through Epi Info's incorporation of GIS and Windows standards
• Public domain status of the program allowed downloading the latest version from the Internet and distributing it to any number of partners without licensing difficulties Geographical and spatial analysis has gained importance in public health during the last few years. Maps reveal spatial relationships and facilitate communication. The availability of maps during the investigation can be attributed to the fact that the NJDHSS had made resources available from its GIS center. The GIS professional was able to create the shapefiles while the rest of the emergency response team (ERT) focused on data collection and data management. Once created, maps were widely available for use among all members of the team, and Epi Map 2000 could be used to link data in Microsoft Access files to the shapefiles of the building floor plans.
The main use of GIS in this investigation was to create a relatively detailed base map and to locate environmental sampling and provide spatial analysis and visual support to the investigation. We identified and located all samples taken in the different facilities. In a matter of days we connected the database system with the map files to provide up to date information regarding sample processing and results. The maps created provided a new perspective of the magnitude of contamination inside the facility and also identified non-contaminated areas.
We extended the mapping capabilities to illustrate the area of coverage of the active and passive surveillance sys-tems implemented in surrounding communities for the detection of new cases (Figure 2). The surveillance system was originally restricted to nine counties and was latter extended to others. Changes in the number of centers included in the surveillance system were reflected in different versions of the same map to document changes over time.
Because of the diversity of the data sources during the investigation, the database system required frequent maintenance. We were able to automate most of the data management routines using Epi Info program (scripting) files and to produce automatic reports.
Laboratory results were available through the local area network in the New Jersey State Health Department where the team worked. Although we might have been able to connect to the laboratory system to obtain real-time updates, for security reasons, the laboratory results for the investigation were exported and became available to the Emergency Response Team at the end of each day. There was therefore a modest, but significant delay in availability of laboratory information.
During emergency field situations, staff rotation may present problems. In this investigation field staff rotated every two weeks. Any system intended for use by more than one group of people should be easy to use, compatible with other software, and well documented. The database was documented using entity relationship (ER) diagrams ( Figure 3). Epi Info has a command to display the structure of a table, making it easy to document the system for the replacement team that arrived after two weeks. This second team preferred to work with Microsoft Access rather than Epi Info, but the fact that Epi Info 2000 stores data in generic Access 97 tables facilitated the transition.
Document tracking and version control are important when geographically separated teams are working on the same set of problems. In this case each one of the field sites had a set of experts developing documents and recommendations that applied to the whole event and not only to the geographic area of investigation. It was important to keep the most recent set of recommendations available to all members of the emergency response team for decision-making and information dissemination. During the investigation, we were able to access the most recent documents with minimal effort, using links in a Web page. A more sophisticated version could be developed to allow tracking changes in a document over time.

Conclusions
Emergency situations require rapid design and implementation of databases to provide reliable information in the field. Having the ability to respond rapidly to emergency situations requires planning and preparedness. Key elements in preparation are in paying sufficient attention to informatics support and having both staff and software that can adapt quickly to new situations and computing environments. The likelihood of success is greater if the same people and software provide support for more routine epidemic investigations In this field investigation, the informatics environment included Excel spreadsheets, files in several other formats, paper questionnaires and computer printouts, an existing GIS capability using ArcView, and the need for frequent revisions in database structure, frequent summary reports, and protection of personal identifiers. Software for field use must be flexible enough to interact with and unite numerous data sources and computing environments. Investigators must be prepared for new situations, and be ready to adapt new techniques and software as the needs of an investigation unfold.
The traditional steps in systems development of requirements gathering, analysis, design, implementation, and testing must be traversed in rapid sequence during the first few hours or days of an investigation and repeated again as the system evolves and capabilities are added. More formalized approach to systems development needs to be either extensively modified or compressed to account for the time-critical nature of the situation. The use of commercial software standards such as Windows file formats, Web pages, and mainstream GIS programs facilitates the rapid implementation of systems in the field.
Electronic documents were as important as more structured data for this investigation. The design of systems for emergencies should include methods for storing and maintaining version control of documents and images in addition to more structured data.

Geographic Information System
A critical component was the Geographic Information System. A floor plan of the contaminated Hamilton facility was available only as an incomplete and poor quality 8 1/2 × 11 inch photocopy of the original 'E' sized Auto-CAD drawing. It was scanned as a TIFF image, imported into ArcView GIS 3.2 software and used as a backdrop to create shapefiles of the building outline, walls, operational areas and equipment placements, and NIOSH and FBI sampling locations. Attribute tables were also created for each of the GIS theme features. Shapefiles were developed using a scanned copy of the floor plan of each facility and digitized using ArcView ® . The elements of information to be displayed were divided into several After completion, the shapefiles were linked to the data using Epi Map 2000, a component of Epi Info that is ArcView ® compatible. Elements of information stored in databases created in Epi Info 2000 were sent directly to Epi Map for displaying ( Figure 1). The process was automated by using scripts or programs.

Data Collection and Input
To obtain data on persons possibly exposed, a liaison between the U.S. Postal Service and the epidemiologic team was appointed. He was able to obtain MS-Excel © files or computer printouts listing personnel for each of the facilities under investigation. The lists were originally created for payroll or vehicle registration. Additional information was added to those lists to record presence in the facility during the exposure period, sex of the employee, and post exposure prophylaxis (PEP). Most of the clinical follow-up services, including PEP, were provided by the local hospital. The informatics department of the hospital extracted relevant clinical data from the database system and made them available to the investigation team as Excel tables on some days and comma delimited text files on other days.
Primary data items collected by EIS officers and other staff were entered into Epi Info 2000 or into Excel and then imported into Epi Info 2000 (Microsoft Access 97) format. Potentially exposed populations such as regular visitors and relatives of postal workers were identified and manually entered into the main database as the investigation proceeded.
Demographic characteristics of potentially exposed persons were assembled from various sources, including the local hospital where PEP was administered. When PEP was offered to a large group of people, tracking of different treatment protocols and their results became necessary. Paper forms were designed not to become obsolete if protocols are modified during the follow up, and similar revisions in the database were expected.

Database Management
Database management was done in Epi Info 2000, the Windows version of Epi Info [3]. Epi Info 2000 was developed at CDC from commercial component software and Visual Basic and offers compatibility with Microsoft Access files and 20 other standard database formats. It includes a mapping program that uses the ArcView compatible shapefile format and allows the display of data from Access tables on shapefiles. Most of the basic statistics needed for reporting and analysis are built in so that we did not have to spend time coding or debugging algo-rithms. The data collection and data management tools in Epi Info do not require expert database managers.
The database system consisted of a central database designed and maintained in Epi Info 2000. The system was structured in 14 tables linked by a unique key. The unique key at first contained information (was "intelligent"), but we designed a mechanism to replace intelligent keys by non-intelligent keys. The intelligent key contained facility code, deployment place (NJ) exposed unique identifier, and postal facility involved. We included the deployment site in the key assuming that the information would be linked with other sites at a latter time.

Security
From the beginning of the operation, security was a concern. Because of the complexity of the investigation we decided that all sensitive information -defined as information that would allow identification of specific individuals -would be handled separately from the rest of the database. We created a single flat table with all identifiers and related the table to the rest of the database system. With that structure, we were able to handle sensitive information as a unit. Identifiers were not sent to CDC Headquarters in Atlanta, for example, and sensitive data were protected by the security features of the NJDHSS computing environment.

Document Management
A commercial HTML editor (Microsoft Front Page 2000 ® ) was used for the maintenance of the document repository Web page. Web pages included in the system did not contain any special scripting that required an advanced Web programming techniques. The level of HTML coding was basic, and mostly inserted automatically by Front Page.
The document repository allowed coding and storing versions of documents during the progress of the investigation. Documents were organized in folders and assigned code numbers.
Members of the team were instructed to check in documents on a regular basis. The document check-in process included a simple computer-based form with the name of the document, the author, and a time stamp. The first two elements were used to create a catalog, and the timestamp was used for versioning control.