Interactive web-based mapping: bridging technology and data for health
© Highfield et al; licensee BioMed Central Ltd. 2011
Received: 20 October 2011
Accepted: 23 December 2011
Published: 23 December 2011
The Community Health Information System (CHIS) online mapping system was first launched in 1998. Its overarching goal was to provide researchers, residents and organizations access to health related data reflecting the overall health and well-being of their communities within the Greater Houston area. In September 2009, initial planning and development began for the next generation of CHIS. The overarching goal for the new version remained to make health data easily accessible for a wide variety of research audiences. However, in the new version we specifically sought to make the CHIS truly interactive and give the user more control over data selection and reporting.
In July 2011, a beta version of the next-generation of the application was launched. This next-generation is also a web based interactive mapping tool comprised of two distinct portals: the Breast Health Portal and Project Safety Net. Both are accessed via a Google mapping interface. Geographic coverage for the portals is currently an 8 county region centered on Harris County, Texas. Data accessed by the application include Census 2000, Census 2010 (underway), cancer incidence from the Texas Cancer Registry (TX Dept. of State Health Services), death data from Texas Vital Statistics, clinic locations for free and low-cost health services, along with service lists, hours of operation, payment options and languages spoken, uninsured and poverty data.
The system features query on the fly technology, which means the data is not generated until the query is provided to the system. This allows users to interact in real-time with the databases and generate customized reports and maps. To the author's knowledge, the Breast Health Portal and Project Safety Net are the first local-scale interactive online mapping interfaces for public health data which allow users to control the data generated. For example, users may generate breast cancer incidence rates by Census tract, in real time, for women aged 40-64. Conversely, they could then generate the same rates for women aged 35-55. The queries are user controlled.
Keywordsinteractive mapping health disparities geographic information systems
Geospatial assessment and communication of disease patterns, risk and health outcomes, has historically been limited to specialized researchers, using specialized tools . With the advent of web 2.0, the landscape of accessing health information has shifted, and more researchers (academic and non-academic) are using the web for data sharing, research and planning purposes . Many health departments now provide public access to their health statistics via the Internet, including morbidity and mortality indicators . This has promoted user involvement and data examination beyond the traditional academic model [3, 4]. The web can serve as a tool for sharing real-time information across a spectrum of potential users; additionally geographic information systems (GIS) technologies on the web have become increasingly popular and user- friendly.
GIS technologies are tools that allow for the storage, management, manipulation and visualization of spatial data . Web-based GIS has the potential to connect a large number of researchers and lay people with geospatial public health data. Distributing and sharing geospatial data on the web can aid health planners, policy researchers and decision-makers in their collaborative efforts to improve public health outcomes and design interventions targeted to local populations . At the forefront of this movement to share data across the spectrum was the Community Health Information System (CHIS) online mapping system, which was first launched in 1998. Its overarching goal was to provide researchers, residents and organizations access to health related data reflecting the overall health and well-being of their communities within the Greater Houston Area. The central theme for the development of the original platform was to create a tool that could be used to visualize, analyze and ultimately reduce health disparities. The original interface effectively connected a wide audience with information vital to developing action plans to address identified health needs in underserved areas.
However, over time, a number of issues were identified that limited the utility of the system. The original design of the CHIS mapping system required the download of an ActiveX control, which is a Microsoft framework that allows programmers to take advantage of additional features that would not normally be available on the browser. It also only worked in Internet Explorer, limiting user accessibility as a host of new browsers came online. The original CHIS provided users pre-categorized static maps, which could be overlaid with clinic locations. Additionally, users could access stand-alone reports containing additional data about the area of interest, such as Census data on population.
As the CHIS was developed and grew over time, the back-end database for the geo-referenced data became an issue. Data was loaded into multiple tables, sometimes containing repetitive information, due to the nature of the piecemeal addition of new data. Maintaining up-to-date clinic information is vital to the success of the portals, therefore improving usability in the back-end of the application for both the clinics uploading information and the administrator was important. And the original multiple databases housing all of the data needed to be analyzed and synthesized into a single database that was flexible enough to expand for future data types. These issues are not unique to CHIS. Models developed over time tend to run into issues with data storage and must be adaptive to continue to serve their target audience.
A number of online public health mapping tools have been developed since the launch of CHIS in 1998, including many developed by State and local public health departments. Almost all of these systems are interactive, allowing the user to query data and then generate a map . One of the current limitations with these systems is an issue of scale. Most of these systems provide data at a resolution no smaller than County level. Those providing sub-County data, to date, only do so in a pre-categorized static map or as separate reports given in conjunction with the County- level data. None, to the authors knowledge, provide both user-generated local-scale geographical and attribute data (e.g., a resolution smaller than County, such as Census tract, block or Zip code). Local-scale geographical and attribute data is critical for conducting research, targeting interventions and siting new service locations. Many public health research and intervention projects are intended to affect a population at a scale smaller than County resolution. Even still, data that is local-scale is typically provided in an aggregated form, such as Census tract, presenting limitations on the utility of the data. Researchers, planners and service providers need to be aware of the issues surrounding scale and data aggregation, such as the modifiable areal unit problem . The Florida CHARTS is an example of an online interactive health GIS tool providing both County and sub-County information . This site provides the user with the ability to query data on birth or death rates by County across various time periods. Additionally, users can select Census Tracts to open a separate report containing demographic data for the selected tract. While current implementations, such as CHARTS, serve as an important data and planning resource, they do not provide the end-user with the ability to generate customized local-scale maps and reports based on specific user-driven criteria in real-time. The ability to generate data at a meaningful level of spatial aggregation has been cited as a major challenge to overcome with interactive mapping systems in public health .
Technological advances have made it possible to begin addressing this challenge, including Google Earth and the Google Maps API launched in 2005. These tools followed the widespread adoption of Google as a web search engine in the early 2000's. The Google Data APIs allow programmers to create applications that read and write data from the Google service . The widespread use of Google technologies has created a large number of potential users who already have knowledge of how to use these tools. Additionally, these technologies can easily be used to create custom applications in an interface that appears seamless to the user [5, 11–16].
Leveraging the power of Google API technologies, we set out to create a local-scale interactive web-based system for public health data with the next generation of CHIS.
The overarching goal for the new version of the system was to make health data easily accessible for a wide variety of research audiences with an interest in developing health programs benefiting under-served populations such as academic researchers, policy makers, community planners and non-profits who conduct research. This information can be used in grant applications. Specifically, the goal was to create a system in which researchers could easily access demographic, health related measures, clinic location, and clinic service data. The objective was to provide a user-friendly tool to better inform public health intervention development, healthcare expansion and service delivery and, most importantly improve access to care and reduce health disparities in the Greater Houston region. To achieve these goals and objectives, an online, real-time interactive local-scale database system was created.
The portals present users with an easily accessible platform for accessing data, addressing research questions and service delivery decisions. For example, a Federally Qualified Health Center (FQHC) in the local area used the data on PSN related to medically underserved areas, the uninsured, population demographics and clinic locations to determine the proposed site for a new clinic (personal communication). Researchers interested in studying the relationship between mammography screening availability and uninsured women of screening age have used the BHP to address the relationship between these variables. These are two of many examples of how the system can be utilized to benefit researchers, planners and non-profits in the area covered by the portal.
The creation of the next generation of the portals had to be completed given a number of constraints: The portals needed to be accessible to as many people as possible, therefore the technology used to access them needed to be universally accessible, low cost, would not require any additional downloads and be as browser-agnostic as possible. The portals also needed to be as interactive as possible, allowing users to dynamically define their research parameters rather than being forced to accept preset queries.
Google technologies, including Google mapping, were chosen for a variety of reasons: First, Google and Google Maps are well known and widely used by a cross-section of the portals' target audience, which means reduced barriers to use. Compatibility was also a powerful selection criterion; no third-party implementation was necessary to install and access the maps. Google maps were a low-cost choice, since they were free and supported by SSL protocol when we began development of the Beta portals. Google announced usage limits to the Maps API in October, 2011 For more details, see http://googlegeodevelopers.blogspot.com/2011/10/introduction-of-usage-limits-to-maps.html. As can be seen in the article, Google intends the charges to affect power-users of the API. No charges are incurred with less than 25,000 map loads per day per API. The SLEHC portals are not expected to exceed that limit.
Google also provides an ideal technology for mobile support with access on Google Android-enabled devices and Apple ios devices (iPhone and iPad). Opportunities to integrate with future Google-developed innovative tools, such as Google Public Data Explorer, were important. We plan to enable the portals for mobile access and use Google Public Data Explorer in the next version of the system. Utilizing a variety of technologies in the Beta portals allowed users to interact with and query local-scale health related data in real time. Nevertheless, four core challenges to the technical implementation had to be overcome: 1) Polygon loading time, 2) Data consolidation and standardization, 3) Application usability, and 4) Real-time data calculation.
Polygon loading time
There were 4404 polygons of Census tracts and 2884 Zip code polygons to load in the user's browser. If they loaded simultaneously it would have made the user experience unacceptably slow. Our solution was to constrain loading to the given viewing area. We detected the coordinates of the upper left and lower right corners of the display window and then displayed/calculated only those polygons within the proscribed area. This significantly increased the speed of the system and improved the user experience by reducing wait time for results. The system currently takes just under 1 second to load each polygon.
Consolidation and standardization of data
The first generation of the application (CHIS) used data from 12 separate databases. To streamline the data and improve functionality for the next iteration, the databases were consolidated into a single database, requiring a massive reorganization. A majority of the data definition and field types in the database were stored in order to allow scalability. We also strictly defined additional data definition tables that will allow the system administrator to configure the data fields for any future data provided by non-federal sources and restricted data format to comma-separated variable (CSV), which significantly improved importation of data into the system and allowed data to go live immediately after upload (assuming the data are checked and clean).
Usability, interactivity, presentation
One of the constraints of the redevelopment of the system specified interactive, user-defined queries. In a typical mapping application, users expect to click on a given polygon and immediately access data. Interactivity of this type is generally Windows-like behavior and is difficult to accomplish with Web-based technology where interactivity is limited. Applications and widgets exist that improve interactivity within a web environment, but they violated one of the other constraints of the project, which were maximized accessibility with no additional software purchases or downloads. Technologies such as Flex, Flash and JavaFS were options, but each of them would have created severe roadblocks to some users' ability to access the portals. The application of Ajax technology allowed for the development of a Windows-type environment and the posting of data dynamically without requiring page refresh.
Interactivity in the new portals includes dynamic user-defined queries (demographic parameters, date range of data) using sliding scales, as opposed to the canned, preset reports in the previous incarnation of the database. Also, Census tracts are dynamically highlighted as the user hovers over specific Census tract numbers in the portal navigation, allowing them to locate tracts visually as well as numerically. In the future the portals will allow users to actually drag a given polygon into a basket.
The Census tracts in the data set vary widely in population density, making comparison of relative incidence rates from one to another a complex set of calculations. Those calculations take time, which then detracts from the user experience. Calculating relative radius for identified clinics was also an issue. In the Beta version, we solved both by using the ASP.NET data caching method. The next version will have the ability to save frequently used reports to minimize the amount of calculation time used by the system. The next version of the application will calculate all of the different permutations and save them in the database.
We have met and solved many challenges in developing these portals, but there are still technological challenges to be addressed. The system does not perform optimally in Internet Explorer due to coding issues within the browser. Currently, it is recommended to access the system in Google Chrome or Mozilla Firefox. Future work will attempt to address the issue with Internet Explorer to ensure a good experience for all users. Additionally, reports are only available as a new html window in the Beta system. Future versions will utilize interactive pdf technology allowing the user to interact with the report as a pdf. This will be particularly useful once Google Public Data Explorer is implemented, providing the user the capacity to animate a time series of data within a pdf document.
Currently, the portals contain data for the Houston-Galveston-Brazoria Consolidated Metropolitan Statistical Area (CMSA). To give an idea of the scale of this current implementation, the CMSA is larger than the State of New Jersey . Plans are underway to expand the next version of the portals to include multi-county, regional and, eventually, statewide data. The State of Texas is the second largest in the U.S., in both geographic area and population. The unique set of challenges in the expansion is: 1) How to enroll and maintain data from clinics serving the underserved and, 2) How to allow users to quickly access local-scale data across wide geographies. In the development of the CHIS and in the Beta portal system, the authors have worked closely with the clinics that serve the underserved population in our region and provided the assistance of a community liaison, at no charge, to encourage their enrollment and help maintain the data. Clinics that serve this target population (underserved) are very busy and often under-staffed. Providing assistance in enrollment has proven helpful to their participation. Additionally, the clinics benefit from their enrollment by increasing the visibility of the clinic and attracting new patients from portal usage. However, providing a community liaison is an expensive and time intensive investment in the portal data. As we consider expanding to larger geographies, we will have to address how to efficiently enroll and maintain data for these clinics. Approaches that are being considered include region by region expansion with the use of our current liaison working directly with the local clinics and partnering with organizations that already serve and know the clinics well, such as the Breast Health Collaborative of Texas or the Lone Star Association of Clinics. Both member organizations have state-wide enrollment of our target audience and could potentially provide assistance in data entry.
In regard to the second issue, the Beta portals present spatially referenced data to a variety of potential users. They do not, however, provide a detailed explanation of the issues with utilizing spatial data such as heterogeneity and issues with rate calculation (such as the need for spatial smoothing). These are particularly important issues for a non-spatially trained user. In the case of calculated disease rates, areas with small populations (e.g., tracts) could be subject to variance instability. This means that the rates on the map may spuriously suggest differences in the underlying risk of disease [18–20]. The Beta portals also do not explain the age-adjusted rate calculation or the use of a "standard" population. The onus, at the moment, is on the user to understand how to interpret the data provided. Consequently, the immediate next development steps of the portals will include more detailed metadata, data explanations and tutorials. Additionally, a frequently asked questions (FAQ) section will be added to address some of these issues. While these are important issues to consider, our purpose with the development of the Beta site was an attempt to create a functional local-scale "query on the fly" system for accessing a variety of public health data, not to address all the issues surrounding the use of spatial data. Future versions may also employ spatial smoothing techniques to deal with issues in presenting disease rates for small areas. Another important issue to consider in the presentation of data in an online mapping portal is the modifiable areal unit problem (MAUP). The MAUP causes data analyzed at either a varying aggregation or scale to result in different answers . This is an issue that the authors can only control to a certain extent. Much of the data used in the system is natively aggregated (Census data, incidence and mortality data by tract). In the current Beta system, we only present data at the unit of aggregation it was provided to the authors in. In other words, we have made no changes to the data (aggregation or disaggregation). However, as we expand the system to allow users to access data across multiple scales, the MAUP will continue to be an important issue to consider. One way to address this is to control the scale at which data is presented, as we do currently. As part of the future work on the system, the authors plan to convene an advisory panel comprised of spatial data experts to help determine how to address this issue and many others in future versions of the system.
In the first two months following the launch of the portals, there were 395 visits by 285 unique visitors. It is difficult to know if these numbers reflect a good outcome or not. The Beta system was not marketed or advertised. It also doesn't require a login, so we were not able to record information on the users. A login will be included in the full version of the system and will facilitate better tracking of users and allow for evaluation of the system. The development of the Beta portals took a large amount of resources (both time and money). While online mapping systems, such as the portals, have the potential to reach a large number of users, the amount of resources required to develop and maintain such a system is an important consideration. Future work will be focused on evaluation and assessment of the utility of the system and its ability to meet its objectives and the needs of users. Once the system moves from Beta to the full version, a focus on marketing and training will occur.
To the author's knowledge, the BHP and PSN are the first local-scale interactive online mapping interface for public health data. The uniqueness of the system lies in the amount of flexibility and control it provides to the user, while providing local-scale data. Rather than pre-categorizing data, the system gives the user the ability to create custom queries specific to their research needs at local-scale geography. For users, this is a critical advancement. The ability to easily access data (for example, breast cancer mortality in African American women aged 40-64 years) at a local geography is essential to conducting research within specific geographies or specific populations. The authors feel that the major contribution of this work is in serving as an example of what can be done using the latest technologies and serving as a launching platform for others to improve on this work. With that in mind, the authors would encourage all interested researchers to contact us if they have an interest in developing similar systems or if they are interested in partnering on continued developments to our site. Initial feedback for the system has been encouraging, particularly regarding the level of user control.
A series of focus groups were held with a cross-section of local researchers to gain insight into how people used the previous version of CHIS and how they would utilize the new system. Organizations participating in the focus groups from across the region included representatives from city and county public health officials, Federally Qualified Health Centers (FQHCs), the Harris County Healthcare Alliance (HCHA) and university-based academic researchers. Participants were asked to provide information about what kind of data they use, how they access it, how they use it and what kind of system would be useful for their research needs. Based on participant feedback and suggestions, initial development of the next generation of interactive mapping portals began. The Breast Health Portal (BHP), focused on breast cancer and access to mammography screening, and the Project Safety Net (PSN), focused on access to primary health care services. The first phase included mock-ups of the system, followed by a prototype (beta website). Feedback on these mock-ups was sought from community (researchers) throughout the development process.
The polygons returned by the Google mapping API are expressions of coordinates that the user sees as either specific Census tracts or Zip code areas. They are essentially shaped containers of the proprietary information housed in the database, which is then overlayed on top of Google's generic map information. In the Breast Health Portal, all polygons are expressions of Census tracts. In the Project Safety Net Portal, information is available for both Census tracts and Zip codes. Both the polygons are shaded and the legend is dynamically created based on the user's specific search parameters. All other data on the portals (cancer incidence and mortality, birth, uninsured, etc) are attributed to a specific Census tract or Zip code. The data is received pre-categorized at these geographies. The system also generates polygon outlines expressing a variety of political boundaries: School District, Commissioner District, Senate and House districts, Counties, etc. The markers are represented as colored squares display health centers locations, type, clinic name, contact information and services offered.
C# coding language was used for programming the system. SQL Server 2008 was used for data storage. The database has been designed to allow for easy data updates, without requiring the assistance of a database administrator. New data can be imported in .csv format and allow for updating the content of the portals. The Beta system contained over 11 GB of data comprised of 52 tables and 59 store procedures used to produce the maps.
Geographic coverage and data
Data available on the BHP include: age, race, uninsured status, poverty, cancer incidence (1995-2007), and mortality rates (2000-2006) over single or multiple years. In addition, point locations of service providers are included. . Additionally, geo-political boundaries such as school districts, city council, and neighborhoods can be overlaid to create customized and comprehensive maps. Data on PSN include: age, race, uninsured status, poverty, birth rates, death rates for the top five leading causes of death as defined by the Centers for Disease Control and Prevention (CDC) and medically underserved areas, in addition to point locations of service providers. Multiple sources provided data: cancer incidence data from the Texas Cancer Registry (Texas Department of State Health Services), death data from Texas Vital Statistics, clinic locations for free and low-cost health services from local clinics registered on the site, preventable emergency department visits from the University of Texas School of Public Health, medically underserved areas from the Health Research Services Administration (HRSA) and mammogram capacity as reported by all agencies offering low-cost mammography screening in the study region. Data on cancer incidence and mortality were calculated as rates per 100,000, age-adjusted to the U.S. standard population, 2000. Under IRB agreement, cancer incidence rates could only be calculated for tracts with five or more cases.
The BHP allows users to access data at Census tract geography. The PSN allows users to access data at from both Census tract or Zip code geographies and toggle between the two. Both portals allow the user to overlay geographic boundaries on selected data, such as school districts, congressional districts and neighborhoods (tract or Zip code) and to generate customized reports based on user selections.
Both the mapping interface and provider search rely on up-to-date clinic information. The maintenance of this part of the database is managed by a community liaison. The first generation of the database required IT involvement when generating reports, adding new clinics, or authorizing administrative privileges. The second generation now has a backend that is much easier to use, streamlining the process for all involved. This is achieved by relying on Ajax based controls and improved layout of field data, which has accelerated data input by almost 50%. Information upload is significantly faster with less time eaten by data refresh. Clinics can jump from step to step/section to section with much greater ease (Figure 4). They are also able to retrieve username and passwords via email as opposed to the rigorous process they once had to endure. For the liaison, administration of the clinic information is greatly improved: The administrator can now create custom reports and add clinics, and assign users and passwords without going through the IT department. The administrator can also dynamically create new fields and tables for site scalability (clinic types, counties, languages, payment methods, years, services.) They can also easily move through the various sections of the application, which has been reported as a vast improvement over the previously cumbersome backend of the first generation.
The author would like to acknowledge St. Luke's Episcopal Health Charities for funding this research. Marlynn May and Karen Williams provided internal review.
- Mills JW, Curtis A: Geospatial Approaches for Disease Risk Communication in Marginalized Communities. Prog Comm Health Partnerships: Res, Educ, Action 2008,2(1):61–72.View Article
- Lee BK: Epidemiologic Research and Web 2.0--the User-driven Web. Epidemiol 2010, 21:6.
- Gao S, Mioc D, Xiaolun Y, Anton F, Oldfield E, Coleman DJ: Towards Web-based representation and processing of health information. Int J Health Geog 2009, 8:3.View Article
- Bell BS, Hoskins RE, Pickle LW, Wartenberg D: Current practices in spatial analysis of cancer data: mapping health statistics to inform policymakers and the public. Int J Health Geog 2006, 5:49.View Article
- Gao S, Mioc D, Anton F, Xiaolun Y, Coleman DJ: Online GIS services for mapping and sharing disease information. Int J Health Geog 2008, 7:8.View Article
- Croner CM: Public Health GIS and the Internet. Annu Rev Public Health 2003, 24:57–82.PubMed
- Openshaw S, Taylor P: A million or so correlation coefficients: Three experiments on the modifiable area unit problem. Statistical Applications in the Spatial Sciences 1979. ed
- Grigg M, Alfred B, Keller C, Steele JA: Implementation of an Internet-based Geographic Information System: The Florida Experience. J Public Health Man 2006,12(2):139–145.
- Cromley EK: GIS and Disease. Annu Rev Public Health 2003, 24:7–24.PubMedView Article
- Google data protocol: Developer's Guide Overview. Accessed September 16, 2011. Available at: http://code.google.com/apis/gdata/docs/developers-guide.html
- Gu W, Wang X, McGregor SE: Optimization of preventive health care facility locations. Int J Health Geog 2010, 9:17.View Article
- Cinnamon J, Schuurman N: Injury surveillance in low-resource settings using geospatial and social web technologies. Int J Health Geog 2010, 9:25.View Article
- Yi Q, Hoskins RE, Hillringhouse EA, Sorensen SS, Oberle MW, Fuller SS, Wallace JC: Integrating open-source technologies to build low-cost information systems for improved access to public health data. Int J Health Geog 2008, 7:29.View Article
- Kamel Boulos MN, Viangteeravat T, Anyanwu MN, Nagisetty VR, Kuscu E: Web GIS in practice IX: a demonstration of geospatial visual analytics using Microsoft Live Labs Pivot technology and WHO mortality data. Int J Health Geog 2011, 10:19.View Article
- Chang AY, Parrales ME, Jimenez J, Sobieszcyzk ME, Hammer SM, Copenhaver DJ, Kulkarni RP: Combining Google Earth and GIS mapping technologies in a dengue surveillance system for developing countries. Int J Health Geog 2009, 8:49.View Article
- Kamel Boulos MN: Web GIS in practice III: creating a simple interactive map of England's strategic health authorities using Google maps API, Google Earth KML and MSN virtual earth map control. Int J Health Geog 2005, 4:22.View Article
- City of Houston: Houston Facts and Figures. Accessed October 20, 2011. Available at: http://www.houstontx.gov/abouthouston/houstonfacts.html
- Anselin L, Lozano N, Koschinsky J: Rate Transformations and Smoothing. 2006.
- Clayton D, Kaldor J: Empirical Bayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping. Biometrics 1987, 43:671–681.PubMedView Article
- Marshall RJ: Mapping disease and mortality rates using empirical bayes estimators. Applied Statistics 1991, 40:283–294.PubMedView Article
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.