Integrating open-source technologies to build low-cost information systems for improved access to public health data
© Yi et al; licensee BioMed Central Ltd. 2008
Received: 11 April 2008
Accepted: 09 June 2008
Published: 09 June 2008
Effective public health practice relies on the availability of public health data sources and assessment tools to convey information to investigators, practitioners, policy makers, and the general public. Emerging communication technologies on the Internet can deliver all components of the "who, what, when, and where" quartet more quickly than ever with a potentially higher level of quality and assurance, using new analysis and visualization tools. Open-source software provides the opportunity to build low-cost information systems allowing health departments with modest resources access to modern data analysis and visualization tools. In this paper, we integrate open-source technologies and public health data to create a web information system which is accessible to a wide audience through the Internet. Our web application, "EpiVue," was tested using two public health datasets from the Washington State Cancer Registry and Washington State Center for Health Statistics. A third dataset shows the extensibility and scalability of EpiVue in displaying gender-based longevity statistics over a twenty-year interval for 3,143 United States counties. In addition to providing an integrated visualization framework, EpiVue's highly interactive web environment empowers users by allowing them to upload their own geospatial public health data in either comma-separated text files or MS Excel™ spreadsheet files and visualize the geospatial datasets with Google Maps™.
Information access is of critical importance for the practice of public health. Having timely, accurate and readily available information is essential to monitoring the health of communities and populations. Having access to public health data is essential to determining the association of environmental exposures to diseases, as well as measuring the progress and the efficacy of interventions. This information informs, educates and empowers people to develop a dialogue which results in effective programs and policies. It helps health authorities evaluate effectiveness, accessibility and quality of various public health services. It facilitates and supports data driven policy-making. The World Wide Web has enabled public health agencies with sufficient information technology resources to develop web-based information systems to extend their capacity for effectively using the data they collect. EpiQMS , developed by the state health department of Washington and the University of Washington, is currently used by health departments in Washington and Pennsylvania. It is implemented using a combination of proprietary operating systems and statistical packages. Many other information systems for analyzing, visualizing, and delivering public health data are implemented with proprietary software systems often beyond the reach of resource-constrained public health agencies. However, as open-source technologies continue to evolve, web-based information systems for collection, storage and analysis of public health data can be built quickly and efficiently with free open-source software and services. These open-source technologies can be shared with health agencies throughout the world and help resource-constrained public health agencies, including those in developing countries, utilize Internet resources with minimal development and support costs.
Two reference datasets used in our EpiVue prototype are derived from accumulated Washington State Cancer Registry data  and death registry data from the Washington State Center for Health Statistics . A third dataset was downloaded and incorporated into the EpiVue system from a life expectancy study covering a twenty-year interval for 3,143 United States counties . Geographic data for United States state, county boundaries and ZIP code information are derived from ASCII formatted cartographic boundary files of Census 2000 of the United States Census Bureau .
In summary, EpiVue's upload capability allows non-GIS-expert users to quickly visualize their geospatial data without additional support of GIS expertise or additional support of software plug-ins. It also provides a complete platform and roadmap for testing and modifying EpiVue for knowledgeable computer users seeking to deploy the EpiVue package locally. Example data files for file-uploading in each format are provided in Additional files section [see Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].
Access to public health data through the Internet has evolved rapidly in the past several years, especially in the United States, and many health agencies from the federal level through state and local health departments offer web access to public health data. Most of these applications are developed using proprietary software systems limited to presenting data in tabular format with no geospatial visualization capabilities. As public health awareness and interventions move beyond local, state and national boundaries towards a global health perspective, an increasing amount of public health data will need to be integrated and publicly accessible. The cost and complexity of implementing traditional proprietary solutions, however, will be a limiting factor for software system deployment in public health, especially for agencies with limited resources. For example, just the software cost alone for a web application built with Microsoft Windows server , Microsoft SQL database server , a commercial statistical package like SAS , and a commercial GIS package like ESRI  can easily reach tens of thousands of dollars. In contrast, the EpiVue application framework, composed exclusively of publicly available open-source technologies, can serve as a prototype for building low-cost public health visualization and assessment systems.
Challenges and limitations clearly exist in deploying public health data using web-GIS open-source technologies in terms of security and privacy. Google Maps™ in particular requires spatial data coordinates to traverse the Internet in order to utilize web-GIS capabilities. Still, a large body of valuable public health data is not limited by privacy and security constraints, such as the publicly available cumulative datasets [8–11] used in this study. Many researchers are already deploying web-GIS visualization in their applications in diverse public health arenas [16–18, 29, 30]. AEGIS  uses the Google Maps™ API to display simulated temporal and spatial alarms related to syndromic surveillance. HealthMap  pinpoints world-wide infectious disease outbreak information in a web-GIS interface using Google Maps™. The WhoIsSick website  combines an innovative social networking schema with Google Maps™ mashups for voluntary anonymous reporting of infectious disease symptoms.
Another limitation in using the Google Maps API for web-GIS is the size of the geocoded data sent to the Google Maps server for rendering on the client browser system. Based on our experience, the Google Maps server is able to render approximately 100,000 latitude/longitude boundary points per geocoded overlay. Although this is sufficient to display all 254 Texas county boundaries at full resolution in the life expectancy example above, it falls short of the approximately 150,000 points required to display all fifty states in the United States at full resolution using the 2000 U.S. Census data files. We were able to get around this limitation by sampling every third data point without significant compromises in performance or resolution. A larger issue in building low-cost web-GIS systems is the lack of freely available geographic data sets, particularly for use in underserved countries. Currently, the U.S. Census Bureau has made census related geographic data at levels of state, county, ZIP code, census block, and census tract available to the public through its web site . Canada is also providing some quality geographic data at no cost for the public through GeoBase .
Interoperability with existing health data and software systems will be crucial if the use of web-GIS in public health is to gain acceptance among public health practitioners and the general public. EpiVue is specifically designed for simplicity and interoperability at both the web browser level and at the systems level to achieve broad end-user acceptance while at the same time promoting ease of implementation and support for local public health agency installations.
At the browser level, EpiVue runs on all common web browsers without additional plug-ins or local application software. Requiring the use of specific web browsers, or browser plug-ins, introduces complexity, incompatibility and security issues which can discourage widespread adoption of web applications. However, nothing in the EpiVue applications framework precludes the use of plug-ins. For example, open-source web-GIS tools such as Scalable Vector Graphics (SVG)  have already made their way into public health visualization applications [1, 20]. SVG and other plug-in technologies such as Adobe Flash could be valuable additions to future versions of the EpiVue suite of open-source tools.
EpiVue's system architecture is designed for future growth and interoperability with other software systems and services. EpiVue's Java based application framework lends itself to integration with other Java applications such as AEGIS-CCT [16, 21], an open-source software tool for creating simulated outbreak clusters in surveillance systems. The EpiVue JBoss server features built-in web services capabilities which can be used to link services to a larger public health computing grid. The R statistical language and graphics package included in the EpiVue application framework could be further exploited for its extensive capability of performing complex bio-statistical analysis . R is an open-source alternative to proprietary statistical packages unavailable to resource-constrained health agencies.
EpiVue may be particularly well suited to public health disaster preparedness applications. The Google Maps™ API now includes traffic flow information for 30 major United States cities which could be exploited for monitoring traffic conditions in areas proximal to disasters. It also features a terrain data layer which could be used to show areas susceptible to flooding and water/well contamination in flood prone regions such as Washington State.
We believe the EpiVue application framework can serve as a prototype for building low-cost public health visualization and assessment systems for resource-constrained environments. In the future, we envision integrating the EpiVue application framework with our public health knowledge management system, myPublicHealth , to create a comprehensive "dashboard" allowing public health officials to quickly assess the health of their local communities and respond rapidly to adverse public health events.
EpiVue was designed using the conventional 4-tier J2EE architecture consisting of a data source tier using a PostgreSQL database and static files, a business tier utilizing Enterprise Java Beans (EJB), a web tier using servlets and Java Server Pages (JSP), and a generic browser client tier.
The three datasets used in this work were originally obtained in different formats. The Washington State cancer dataset  was in tab-delimited ASCII format, the Washington State death dataset  was in Xbase data file (.dbf) format, and the life expectancy dataset  was in Microsoft Excel spreadsheet format (.xls). Each of these files was parsed using Perl  programs with the following Perl utility packages: XBase.pm for .dbf files and Spreadsheet.pm for .xsl files. Parsed datasets were loaded into a PostgreSQL  relational database using the Perl DBI.pm module. All Perl modules mentioned above are freely available for download from CPAN . The geographic data for state, county boundaries and ZIP codes were downloaded from U.S. Census 2000 by the U.S. Census Bureau . The ZIP code geographic data were loaded into the PostgreSQL  database and each ZIP code is represented by its centroid latitude and longitude. County and state boundary data were stored in both the PostgreSQL database and static tab-delimited ASCII files.
The web tier and the business tier are components of the JBoss J2EE application server . The EJB beans in the business tier interact with the database and perform all business queries. Servlets in the web tier process client requests then pass them to EJB beans and obtain the query results from EJB beans. Servlets also integrate various utility tools listed in Figure 1 to generate responses to client requests. Java Server Pages (JSP) in the web tier is used to present dynamic responses to clients.
EpiVue is accessible through its web site .
The authors greatly appreciate the Google Maps™ API tutorial web site  for providing a wonderful tutorial to get started. We also thank Renee Risher and Missie Thurston for their help in reviewing the manuscript. This work was supported by CDC Center of Excellence in Public Health Informatics grant P01 HK 000027.
- Hoskins RE, O'Connor C, Johnson C, O'Carroll P, Fuller S: EpiQMS: An Internet Application for access to Public Health Data for Citizens, Providers, and Public Health Investigators. J Public Health Manag Pract. 2002, 8 (3): 30-36.PubMedView ArticleGoogle Scholar
- PostgreSQL Relational Database. [http://www.postgresql.org]
- JBoss Application Server, a J2EE application server. [http://www.jboss.org]
- JFreeChart, a free 100% Java chart library. [http://www.jfree.org/jfreechart/]
- R, a language and environment for statistical computing and graphics. [http://www.r-project.org/]
- Google Maps API. [http://www.google.com/apis/maps/index.html]
- Java Programming language. [http://java.sun.com]
- Washington Cancer Registry Data. [https://fortress.wa.gov/doh/wscr/]
- Washington Death Registry Data. [http://www.doh.wa.gov/EHSPHL/CHS/CHS-data/death/deatmain.htm]
- Murray CJL, Kulkarni SC, Michaud C, Tomijima N, Bulzacchelli MT, Iandiorio TJ, Ezzati M: Eight Americas: Investigating Mortality Disparities across Races, Counties, and Race-Counties in the United States. PloS Medicine. 2006, 3 (9): 1513-1524. 10.1371/journal.pmed.0030260.View ArticleGoogle Scholar
- Census 2000 of US Census Bureau. [http://www.census.gov/main/www/cen2000.html]
- R map libraries. [http://alumni.media.mit.edu/~tpminka/software/maps/]
- Maclachlan JC, Jerrett M, Abernathy T, Sears M, Bunch MJ: Mapping health on the Internet: A new tool for environmental justice and public health research. Health & Place. 2007, 13: 72-86. 10.1016/j.healthplace.2005.09.012.View ArticleGoogle Scholar
- ESRI ArcIMS, the solution for delivering dynamic maps and GIS data and services via the Web. [http://www.esri.com/software/arcgis/arcims/index.html]
- MapServer, an open-source development environment for building spatially-enabled internet applications. [http://mapserver.gis.umn.edu/]
- Reis BY, Kirby C, Hadden LE, Olson K, McMurry AJ, Daniel JB, Mandl KD: AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc. 2007, 14 (5): 581-8. 10.1197/jamia.M2342.PubMedPubMed CentralView ArticleGoogle Scholar
- Freifeld CC, Mandl KD, Reis BY, Brownstein JS: HealthMap: Global infectious disease monitoring through automated classification and visualization of Internet media reports. J Am Med Inform Assoc. 2007,Google Scholar
- Who is Sick?. [http://www.whoissick.org]
- Scalable Vector Graphics (SVG). [http://www.w3.org/Graphics/SVG/]
- Kamadjeu R, Tolentino H: Web-based public health geographic information systems for resources-constrained environment using scalable vector graphic technology: a proof of concept applied to the expanded program on immunization data. International Journal of Health Geographics. 2006, 5: 24-31. 10.1186/1476-072X-5-24.PubMedPubMed CentralView ArticleGoogle Scholar
- Cassa CA, Iancu K, Olson KL, Mandl KD: A software tool for creating simulated outbreaks to benchmark surveillance systems. BMC Medical Informatics and Decision Making. 2005, 5: 22-28. 10.1186/1472-6947-5-22.PubMedPubMed CentralView ArticleGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedPubMed CentralView ArticleGoogle Scholar
- Revere D, Bugni PF, Fuller SS: An interactive digital knowledge management system to improve public health practitioners' access to public health resources. In Proc AMIA Annu Fall Symp, Washington DC. 2006Google Scholar
- Perl, a cross platform programming language. [http://www.perl.org/]
- Keyhole Markup Language (KML), an XML-based language for managing the display of 3D geospatial data in Google Maps™ and Google Earth. [http://code.google.com/apis/kml/documentation/]
- Google Spreadsheet Autofilter (API Example). [http://gmaps-samples.googlecode.com/svn/trunk/mapcoverage_filtered.html]
- Microsoft Servers. [http://www.microsoft.com/servers/default.mspx]
- SAS. [http://www.sas.com]
- Boulos MNK: Web GIS in practice III: creating a simple interactive map of England's Strategic Health Authorities using Google Maps API, Google Earth KML, and MSN Virtual Earth Map Control. International Journal of Health Geographics. 2005, 4: 22-10.1186/1476-072X-4-22.PubMedPubMed CentralView ArticleGoogle Scholar
- Boulos MNK, Burden D: Web GIS in practice V: 3-D interactive and real-time mapping in Second Life. International Journal of Health Geographics. 2007, 6: 51-10.1186/1476-072X-6-51.PubMedPubMed CentralView ArticleGoogle Scholar
- GeoBase. [http://www.geobase.ca]
- CPAN. [http://www.cpan.org]
- EpiVue. [https://epivue.cphi.washington.edu/epivue]
- Google Maps API tutorial. [http://econym.googlepages.com/index.htm]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.