Disease mapping architecture
To overcome in particular the heterogeneous data integration and service interoperability challenges to disease mapping, we propose the disease mapping architecture illustrated in Figure 1. The architecture contains four tiers: a data storage tier, an ontology engine tier, a standard health services tier, and a maps and animation tier.
• Data storage tier
The health data could be collected by different health organizations and stored in files or databases. They can be accessed through the Internet or Intranet for data sharing.
• Ontology engine tier
The ontology engine is designed to overcome the heterogeneities existing in the distributed health data. It provides a uniform way for the standard health services to retrieve data. Health data matching and transformation tasks are processed by the ontology engine.
• Standard health services tier
Explicit standards are proposed to be used in this tier for the interoperability of the disease mapping system. OGC provides many specifications in sharing spatial related data, which is possible to support disease data sharing. Generally, there would be three kinds of services:
– Health data processing services are responsible for analyzing the disease from spatial and temporal aspects. Many statistical methods are used in the analysis of the disease. Most common ones are crude morbidity ratio, and standardized morbidity ratio. Other methods use spatial autocorrelation indicators like Moran's I and Local G* in detecting disease clusters [13].
– Health mapping services could serve the cartographical representation of the health data to the clients. Providing disease information through dynamically generated maps could control privacy issues more effectively than the SVG or Java Applet technologies which transfer the disease data to the client side.
– Health registry services act as the service brokers in the service-oriented architecture. With the health registry services, all the description information about health processing services, and health mapping services could be published and discovered conveniently through uniform interfaces.
• Maps and animation tier
It provides the spatio-temporal maps for the health practitioners and public in their decision making process. Ogao [19] categorized three types of animation methods from "low" to "high" according to the respective levels of interactivity and complementary domain knowledge that each of them offers to the user: passive, interactive and inference-based animations. Through visualization tools like maps and animation, people could generate hypotheses in disease studies and seek the explanatory factors, which is important in decision making. The ability to share the maps or animations in a distributed environment could also provide a collaborative mechanism in preparation, response, and recovery stages of disease control.
Study area and data description
The province of New Brunswick, Canada and the state of Maine, U.S.A. are our study areas. They share a common, highly travelled territorial border. There are significant volumes of goods and people travelling across this international border and infectious agents are easily carried across both sides. To assure the privacy of the health data, different health organizations or users have different rights in accessing detailed levels of the health data. There will be different levels of privilege in dealing with visualizing and tracking the levels of health data.
In this study, we choose six levels of administrative/census areas that cover the entire territory of both sides. New Brunswick is organized into "Province", "Health Region", "Census Division", "Census Subdivision", "Forward Sortation Area" and "Dissemination Area" geo-layers. In Maine, the corresponding levels are "State", "Health Service Area", "County", "County Subdivision", "Zip Code" and "Census Block Group" respectively.
The data for infectious disease mapping used in this study includes disease data, population data and six levels of geometric boundary data. The infectious disease data for New Brunswick are represented by the hospital discharge data recorded for the New Brunswick Department of Health between 1997 and 2002. The corresponding Maine data were collected through our research partners at the University of Southern Maine. The six levels of geometric boundary data for New Brunswick were obtained from Service New Brunswick, Statistics Canada and Canadian Geospatial Data Infrastructure (CGDI) portal. The six levels of geometric boundary data for Maine were obtained from the American National Spatial Data Infrastructure (NSDI) portal.
The population data of New Brunswick and Maine were acquired from Statistics Canada and the U.S. Census Bureau respectively.
Spatial-temporal data model and data matching
The spatio-temporal object-oriented data model can provide a uniform way to manage spatio-temporal data and support better data management and analysis. The spatio-temporal object-oriented data model used in this study is shown in Figure 2. The Disease class, which describes the disease characteristics, could be extended to its subcategories of disease such as Infectious disease and Respiratory disease. By comparision, a Disease event is a spatio-temporal object that relates to certain kind of disease. It is the activity that associates with a certain kind of disease, such as a hospital observation, training and education service to patients. It includes the patient and the time information. Time could be an instant or interval. Patient is related to the disease case location. Location could be administrative area or geo-coding point. Administrative area could be national level, provincial level, county level, etc.
We integrate the data from New Brunswick and Maine mainly through a common schema integration approach. All the attributes in describing disease, disease event, patient, time, and the six administrative geographic levels of both sides are specified. For instance, in constructing the jurisdiction of Health Region, common attributes such as name, spatial boundary, state/province code and vaccine stock are described. Moreover, a data dictionary is built to match the similar world facts with different definitions to the common schema. For example, the postal code attribute used in New Brunswick and zip code attribute used in Maine are matched to the postcode attribute in the common schema. Through the data matching, the Maine data and New Brunswick data would then be handled in the same way.
Statistical methods for data processing
In this study, we concentrated on the spatial, temporal, and demographic factors and their influence on the infectious disease outbreak, which could show the disease distribution with spatial, temporal, age and gender differences. The statistical methods used are basic statistical calculations of disease rates, as more complex methods would delay the response time in the online mapping process. These statistical methods are the following: Crude Morbidity Rate (CMR), Normalized Morbidity Ratio (NMR), Age-Specific Morbidity Ratio (ASMR), Age-Adjusted Morbidity Ratio (AAMR), and Standardized Morbidity Ratio (SMR).
The purpose of these statistical methods is to provide a standardized legend (pattern/colour) for data representation across temporal, spatial, and jurisdictional layers. The disease data used are in point patterns, which are generated through geo-coding process with the postal code and/or geo-coded civic addresses. Since the name of the postal code may change over time, we consider the spatial location of postal code and/or geo-coded civic addresses to ensure the geo-coding quality. With the "point-in-polygon" spatial operation, it is easy to roll up data and calculate disease cases in relation to certain administrative boundaries. The above five statistical methods are used to calculate the statistical values of disease rates. These statistical values could be expressed through disease mapping variables related to time (e.g., annual, seasonal, monthly, weekly, daily), gender (e.g., male, female, both), age group (e.g., 0–4, ..., 85+, total), geographic level (e.g., Dissemination Areas/Census Block Group, Census Divisions/County, etc.), and/or disease type (e.g., influenza). In the classification maps or charts, the generated thematic maps are based on the above multiple disease mapping variables.
Processing time is also an important factor for online infectious disease mapping, as it takes time to calculate the statistical values. Taking this into account, we have developed two flexible interfaces for obtaining the statistical results. For precomputed cases, the system could respond in real time. In such a case, the statistical values of the pre-defined conditions (spatial level, age group, etc) have already been calculated. The other situation is more flexible and is processed in real time. Users can define the parameters (certain time interval, specific age group, etc) according to their requirements. In addition, a cache mechanism is developed to maintain calculated statistical values. Data warehousing can be used as an alternative approach to improve the processing performance.
OGC services for disease mapping
The OGC Web Map Service (WMS), Styled Layer Descriptor (SLD), and Web Map Context (WMC) are implemented for the disease mapping and sharing in this study. WMS publishes its ability to produce maps rather than its ability to access specific data holdings, and generates spatially referenced maps dynamically [20]. SLD allows user-defined symbolization in producing maps [21], which make it possible to integrate maps from different WMS in the same style. WMC uses eXtensible Markup Language (XML) based context documents including information about the servers providing layers in the overall map, the bounding box, and map projection shared by all the maps, and these provide sufficient operational metadata for clients to reproduce the maps [22].