- Open Access
Integrating open-source technologies to build low-cost information systems for improved access to public health data
International Journal of Health Geographicsvolume 7, Article number: 29 (2008)
Effective public health practice relies on the availability of public health data sources and assessment tools to convey information to investigators, practitioners, policy makers, and the general public. Emerging communication technologies on the Internet can deliver all components of the "who, what, when, and where" quartet more quickly than ever with a potentially higher level of quality and assurance, using new analysis and visualization tools. Open-source software provides the opportunity to build low-cost information systems allowing health departments with modest resources access to modern data analysis and visualization tools. In this paper, we integrate open-source technologies and public health data to create a web information system which is accessible to a wide audience through the Internet. Our web application, "EpiVue," was tested using two public health datasets from the Washington State Cancer Registry and Washington State Center for Health Statistics. A third dataset shows the extensibility and scalability of EpiVue in displaying gender-based longevity statistics over a twenty-year interval for 3,143 United States counties. In addition to providing an integrated visualization framework, EpiVue's highly interactive web environment empowers users by allowing them to upload their own geospatial public health data in either comma-separated text files or MS Excel™ spreadsheet files and visualize the geospatial datasets with Google Maps™.
Information access is of critical importance for the practice of public health. Having timely, accurate and readily available information is essential to monitoring the health of communities and populations. Having access to public health data is essential to determining the association of environmental exposures to diseases, as well as measuring the progress and the efficacy of interventions. This information informs, educates and empowers people to develop a dialogue which results in effective programs and policies. It helps health authorities evaluate effectiveness, accessibility and quality of various public health services. It facilitates and supports data driven policy-making. The World Wide Web has enabled public health agencies with sufficient information technology resources to develop web-based information systems to extend their capacity for effectively using the data they collect. EpiQMS , developed by the state health department of Washington and the University of Washington, is currently used by health departments in Washington and Pennsylvania. It is implemented using a combination of proprietary operating systems and statistical packages. Many other information systems for analyzing, visualizing, and delivering public health data are implemented with proprietary software systems often beyond the reach of resource-constrained public health agencies. However, as open-source technologies continue to evolve, web-based information systems for collection, storage and analysis of public health data can be built quickly and efficiently with free open-source software and services. These open-source technologies can be shared with health agencies throughout the world and help resource-constrained public health agencies, including those in developing countries, utilize Internet resources with minimal development and support costs.
In this work, we explore the potential of open-source technologies by creating a web-based application framework for the visualization of public domain population health data. EpiVue, Epidemiologic Visual User Environment, is built exclusively with freely available open-source software. EpiVue components include PostgresSQL , a relational database for data storage, JBoss , a widely used J2EE Java application server, JFreeChart , an open-source Java chart library for use in applications, applets, servlets and Java Server Pages (JSP), the R  statistical computing and graphics toolkit and Google Maps™  for interactive Geographic Information System (GIS) visualization. EpiVue is based on the Java programming language , enabling it to run on diverse computing platforms and operating systems. It is currently deployed in a conventional Linux operating system environment. EpiVue has been tested with a variety of popular web browsers and is specifically designed to require no additional application software or browser plug-ins which might limit its use. Figure 1 shows EpiVue's application architecture and the integration of the above open-source components.
Two reference datasets used in our EpiVue prototype are derived from accumulated Washington State Cancer Registry data  and death registry data from the Washington State Center for Health Statistics . A third dataset was downloaded and incorporated into the EpiVue system from a life expectancy study covering a twenty-year interval for 3,143 United States counties . Geographic data for United States state, county boundaries and ZIP code information are derived from ASCII formatted cartographic boundary files of Census 2000 of the United States Census Bureau .
EpiVue data are queried through simple web selection controls. Results may be presented in tables, charts and maps. Figure 2 shows a pair of line charts displaying the trend of cancer incidence and mortality age-adjusted rates (per 100,000) over a 12-year period. Figure 3 shows a pair of bar charts and a table comparing cancer incidence and mortality age-adjusted rates (per 100,000) based on race and ethnicity. EpiVue currently uses JFreeChart  to generate charts as shown in Figure 2 and 3. However, the data analysis and graphics can also be implemented with R , an open-source language and environment for statistical computing and graphics. It provides a wide variety of statistical analysis and graphic capability comparable to commonly used commercial statistical software packages, such as SAS .
For visually analyzing epidemiological data to reveal geospatial trends that maybe hidden within charts and tables, EpiVue incorporates two open-source GIS utilities: the R language map utility  and the Google Maps™ Application Programming Interface (API) . Figure 4 shows side-by-side Washington State county maps coded with cancer incidence and mortality age-adjusted rates (per 100,000) using the R map utility. EpiVue also uses Google Maps™ "mashups" to overlay Google Maps™ cartography data layers with deidentified public health data providing a second powerful and intuitive method for geospatial data visualization. Mashups are a new breed of web-based data integration applications which combine data from more than one source into a single integrated "virtual" application. Figure 5 displays side-by-side Washington State county colored polygon maps based on cancer incidence and mortality age-adjusted rates (per 100,000) using the Google Maps™ Keyhole Markup Language (KML) . In comparison to the R language map utility, the Google Maps API is superior in terms of usability, integration, and coverage. In addition, the R map utility currently has very limited built-in geospatial coverage. However, the R map utility has the capability of displaying geospatial data entirely within the confines of a local health agency if deployed on a local server, thereby addressing privacy and security issues with identifiable data, while the Google Maps API requires client applications open to the public.
To test the scalability and performance of the EpiVue application, a dataset of life expectancy data covering all fifty U.S. states published by Murray et al. was downloaded and incorporated into EpiVue . This dataset includes life expectancy statistics for 3,143 counties and the District of Columbia from 1980 to 1999. EpiVue currently limits users to viewing all counties in an individual state, the largest being Texas with 254 counties. Figure 7 shows a color-coded map based on the life expectancy for all genders in all Texas counties in 1999. When an individual county is clicked, an information window shows an embedded JfreeChart  bar chart displaying the trend of life expectancy from 1980 to 1999 for this county as shown in Figure 7. The ability to simultaneously display both spatial and temporal data in a highly interactive way represents an important extension of the Google Maps™ technique in its application to public health information.
EpiVue allows users to interactively upload and display their own geospatial data files as shown in Figure 8. Data files can be supplied in either comma-separated values (CSV) text format or Microsoft Excel™ format with values corresponding to the U.S. ZIP code, county, state or longitude and latitude. A built-in geocoder utilizing Google Maps geocoding services converts an uploaded file of street addresses to a corresponding longitude and latitude file. Furthermore, EpiVue also includes an interactive geocoder to be used along with users' geospatial data coded on Google Maps™ (see Figure 9). This feature allows the interactive addition of custom data on top of the existing geocoded data layer. Google Maps geocoding services currently cover 39 countries .
In summary, EpiVue's upload capability allows non-GIS-expert users to quickly visualize their geospatial data without additional support of GIS expertise or additional support of software plug-ins. It also provides a complete platform and roadmap for testing and modifying EpiVue for knowledgeable computer users seeking to deploy the EpiVue package locally. Example data files for file-uploading in each format are provided in Additional files section [see Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].
Access to public health data through the Internet has evolved rapidly in the past several years, especially in the United States, and many health agencies from the federal level through state and local health departments offer web access to public health data. Most of these applications are developed using proprietary software systems limited to presenting data in tabular format with no geospatial visualization capabilities. As public health awareness and interventions move beyond local, state and national boundaries towards a global health perspective, an increasing amount of public health data will need to be integrated and publicly accessible. The cost and complexity of implementing traditional proprietary solutions, however, will be a limiting factor for software system deployment in public health, especially for agencies with limited resources. For example, just the software cost alone for a web application built with Microsoft Windows server , Microsoft SQL database server , a commercial statistical package like SAS , and a commercial GIS package like ESRI  can easily reach tens of thousands of dollars. In contrast, the EpiVue application framework, composed exclusively of publicly available open-source technologies, can serve as a prototype for building low-cost public health visualization and assessment systems.
Challenges and limitations clearly exist in deploying public health data using web-GIS open-source technologies in terms of security and privacy. Google Maps™ in particular requires spatial data coordinates to traverse the Internet in order to utilize web-GIS capabilities. Still, a large body of valuable public health data is not limited by privacy and security constraints, such as the publicly available cumulative datasets [8–11] used in this study. Many researchers are already deploying web-GIS visualization in their applications in diverse public health arenas [16–18, 29, 30]. AEGIS  uses the Google Maps™ API to display simulated temporal and spatial alarms related to syndromic surveillance. HealthMap  pinpoints world-wide infectious disease outbreak information in a web-GIS interface using Google Maps™. The WhoIsSick website  combines an innovative social networking schema with Google Maps™ mashups for voluntary anonymous reporting of infectious disease symptoms.
Another limitation in using the Google Maps API for web-GIS is the size of the geocoded data sent to the Google Maps server for rendering on the client browser system. Based on our experience, the Google Maps server is able to render approximately 100,000 latitude/longitude boundary points per geocoded overlay. Although this is sufficient to display all 254 Texas county boundaries at full resolution in the life expectancy example above, it falls short of the approximately 150,000 points required to display all fifty states in the United States at full resolution using the 2000 U.S. Census data files. We were able to get around this limitation by sampling every third data point without significant compromises in performance or resolution. A larger issue in building low-cost web-GIS systems is the lack of freely available geographic data sets, particularly for use in underserved countries. Currently, the U.S. Census Bureau has made census related geographic data at levels of state, county, ZIP code, census block, and census tract available to the public through its web site . Canada is also providing some quality geographic data at no cost for the public through GeoBase .
Interoperability with existing health data and software systems will be crucial if the use of web-GIS in public health is to gain acceptance among public health practitioners and the general public. EpiVue is specifically designed for simplicity and interoperability at both the web browser level and at the systems level to achieve broad end-user acceptance while at the same time promoting ease of implementation and support for local public health agency installations.
At the browser level, EpiVue runs on all common web browsers without additional plug-ins or local application software. Requiring the use of specific web browsers, or browser plug-ins, introduces complexity, incompatibility and security issues which can discourage widespread adoption of web applications. However, nothing in the EpiVue applications framework precludes the use of plug-ins. For example, open-source web-GIS tools such as Scalable Vector Graphics (SVG)  have already made their way into public health visualization applications [1, 20]. SVG and other plug-in technologies such as Adobe Flash could be valuable additions to future versions of the EpiVue suite of open-source tools.
EpiVue's system architecture is designed for future growth and interoperability with other software systems and services. EpiVue's Java based application framework lends itself to integration with other Java applications such as AEGIS-CCT [16, 21], an open-source software tool for creating simulated outbreak clusters in surveillance systems. The EpiVue JBoss server features built-in web services capabilities which can be used to link services to a larger public health computing grid. The R statistical language and graphics package included in the EpiVue application framework could be further exploited for its extensive capability of performing complex bio-statistical analysis . R is an open-source alternative to proprietary statistical packages unavailable to resource-constrained health agencies.
EpiVue may be particularly well suited to public health disaster preparedness applications. The Google Maps™ API now includes traffic flow information for 30 major United States cities which could be exploited for monitoring traffic conditions in areas proximal to disasters. It also features a terrain data layer which could be used to show areas susceptible to flooding and water/well contamination in flood prone regions such as Washington State.
We believe the EpiVue application framework can serve as a prototype for building low-cost public health visualization and assessment systems for resource-constrained environments. In the future, we envision integrating the EpiVue application framework with our public health knowledge management system, myPublicHealth , to create a comprehensive "dashboard" allowing public health officials to quickly assess the health of their local communities and respond rapidly to adverse public health events.
EpiVue was designed using the conventional 4-tier J2EE architecture consisting of a data source tier using a PostgreSQL database and static files, a business tier utilizing Enterprise Java Beans (EJB), a web tier using servlets and Java Server Pages (JSP), and a generic browser client tier.
The three datasets used in this work were originally obtained in different formats. The Washington State cancer dataset  was in tab-delimited ASCII format, the Washington State death dataset  was in Xbase data file (.dbf) format, and the life expectancy dataset  was in Microsoft Excel spreadsheet format (.xls). Each of these files was parsed using Perl  programs with the following Perl utility packages: XBase.pm for .dbf files and Spreadsheet.pm for .xsl files. Parsed datasets were loaded into a PostgreSQL  relational database using the Perl DBI.pm module. All Perl modules mentioned above are freely available for download from CPAN . The geographic data for state, county boundaries and ZIP codes were downloaded from U.S. Census 2000 by the U.S. Census Bureau . The ZIP code geographic data were loaded into the PostgreSQL  database and each ZIP code is represented by its centroid latitude and longitude. County and state boundary data were stored in both the PostgreSQL database and static tab-delimited ASCII files.
The web tier and the business tier are components of the JBoss J2EE application server . The EJB beans in the business tier interact with the database and perform all business queries. Servlets in the web tier process client requests then pass them to EJB beans and obtain the query results from EJB beans. Servlets also integrate various utility tools listed in Figure 1 to generate responses to client requests. Java Server Pages (JSP) in the web tier is used to present dynamic responses to clients.
EpiVue is accessible through its web site .
Hoskins RE, O'Connor C, Johnson C, O'Carroll P, Fuller S: EpiQMS: An Internet Application for access to Public Health Data for Citizens, Providers, and Public Health Investigators. J Public Health Manag Pract. 2002, 8 (3): 30-36.
PostgreSQL Relational Database. [http://www.postgresql.org]
JBoss Application Server, a J2EE application server. [http://www.jboss.org]
JFreeChart, a free 100% Java chart library. [http://www.jfree.org/jfreechart/]
R, a language and environment for statistical computing and graphics. [http://www.r-project.org/]
Google Maps API. [http://www.google.com/apis/maps/index.html]
Java Programming language. [http://java.sun.com]
Washington Cancer Registry Data. [https://fortress.wa.gov/doh/wscr/]
Washington Death Registry Data. [http://www.doh.wa.gov/EHSPHL/CHS/CHS-data/death/deatmain.htm]
Murray CJL, Kulkarni SC, Michaud C, Tomijima N, Bulzacchelli MT, Iandiorio TJ, Ezzati M: Eight Americas: Investigating Mortality Disparities across Races, Counties, and Race-Counties in the United States. PloS Medicine. 2006, 3 (9): 1513-1524. 10.1371/journal.pmed.0030260.
Census 2000 of US Census Bureau. [http://www.census.gov/main/www/cen2000.html]
R map libraries. [http://alumni.media.mit.edu/~tpminka/software/maps/]
Maclachlan JC, Jerrett M, Abernathy T, Sears M, Bunch MJ: Mapping health on the Internet: A new tool for environmental justice and public health research. Health & Place. 2007, 13: 72-86. 10.1016/j.healthplace.2005.09.012.
ESRI ArcIMS, the solution for delivering dynamic maps and GIS data and services via the Web. [http://www.esri.com/software/arcgis/arcims/index.html]
MapServer, an open-source development environment for building spatially-enabled internet applications. [http://mapserver.gis.umn.edu/]
Reis BY, Kirby C, Hadden LE, Olson K, McMurry AJ, Daniel JB, Mandl KD: AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc. 2007, 14 (5): 581-8. 10.1197/jamia.M2342.
Freifeld CC, Mandl KD, Reis BY, Brownstein JS: HealthMap: Global infectious disease monitoring through automated classification and visualization of Internet media reports. J Am Med Inform Assoc. 2007,
Who is Sick?. [http://www.whoissick.org]
Scalable Vector Graphics (SVG). [http://www.w3.org/Graphics/SVG/]
Kamadjeu R, Tolentino H: Web-based public health geographic information systems for resources-constrained environment using scalable vector graphic technology: a proof of concept applied to the expanded program on immunization data. International Journal of Health Geographics. 2006, 5: 24-31. 10.1186/1476-072X-5-24.
Cassa CA, Iancu K, Olson KL, Mandl KD: A software tool for creating simulated outbreaks to benchmark surveillance systems. BMC Medical Informatics and Decision Making. 2005, 5: 22-28. 10.1186/1472-6947-5-22.
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.
Revere D, Bugni PF, Fuller SS: An interactive digital knowledge management system to improve public health practitioners' access to public health resources. In Proc AMIA Annu Fall Symp, Washington DC. 2006
Perl, a cross platform programming language. [http://www.perl.org/]
Keyhole Markup Language (KML), an XML-based language for managing the display of 3D geospatial data in Google Maps™ and Google Earth. [http://code.google.com/apis/kml/documentation/]
Google Spreadsheet Autofilter (API Example). [http://gmaps-samples.googlecode.com/svn/trunk/mapcoverage_filtered.html]
Microsoft Servers. [http://www.microsoft.com/servers/default.mspx]
Boulos MNK: Web GIS in practice III: creating a simple interactive map of England's Strategic Health Authorities using Google Maps API, Google Earth KML, and MSN Virtual Earth Map Control. International Journal of Health Geographics. 2005, 4: 22-10.1186/1476-072X-4-22.
Boulos MNK, Burden D: Web GIS in practice V: 3-D interactive and real-time mapping in Second Life. International Journal of Health Geographics. 2007, 6: 51-10.1186/1476-072X-6-51.
Google Maps API tutorial. [http://econym.googlepages.com/index.htm]
The authors greatly appreciate the Google Maps™ API tutorial web site  for providing a wonderful tutorial to get started. We also thank Renee Risher and Missie Thurston for their help in reviewing the manuscript. This work was supported by CDC Center of Excellence in Public Health Informatics grant P01 HK 000027.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.