Geographic analysis of low birthweight and infant mortality in Michigan using automated zoning methodology
© Grady and Enander; licensee BioMed Central Ltd. 2009
Received: 30 September 2008
Accepted: 18 February 2009
Published: 18 February 2009
Infant mortality is a major public health problem in the State of Michigan and the United States. The primary adverse reproductive outcome underlying infant mortality is low birthweight. Visualizing and exploring the spatial patterns of low birthweight and infant mortality rates and standardized incidence and mortality ratios is important for generating mechanistic hypotheses, targeting high-risk neighborhoods for monitoring and implementing maternal and child health intervention and prevention programs and evaluating the need for health care services. This study investigates the spatial patterns of low birthweight and infant mortality in the State of Michigan using automated zone matching (AZM) methodology and minimum case and population threshold recommendations provided by the National Center for Health Statistics and the US Census Bureau to calculate stable rates and standardized incidence and mortality ratios at the Zip Code (n = 896) level. The results from this analysis are validated using SaTScan. Vital statistics birth (n = 370,587) and linked infant death (n = 2,972) records obtained from the Michigan Department of Community Health and aggregated for the years 2004 to 2006 are utilized.
For a majority of Zip Codes the relative standard errors (RSEs) of rates calculated prior to AZM were greater than 20%. Spurious results were the result of too few case and birth counts. Applying AZM with a target population of 25 cases and minimum threshold of 20 cases resulted in the reconstruction of zones with at least 50 births and RSEs of rates 20–22% and below respectively, demonstrating the stability reliability of these new estimates. Other AZM parameters included homogeneity constraints on maternal race and maximum shape compactness of zones to minimize potential confounding. AZM identified areas with elevated low birthweight and infant mortality rates and standardized incidence and mortality ratios. Most but not all of these areas were also detected by SaTScan.
Understanding the spatial patterns of low birthweight and infant deaths in Michigan was an important first step in conducting a geographic evaluation of the State's reported high infant mortality rates. AZM proved to be a useful tool for visualizing and exploring the spatial patterns of low birthweight and infant deaths for public health surveillance. Future research should also consider AZM as a tool for health services research.
Infant mortality refers to infants born alive who die within their first year of life. In 2006, Michigan's infant mortality rate was 7.6 infant deaths per 1,000 live births with African American infants at substantially higher risk (17.7) than white infants (5.2) of death . The primary adverse reproductive outcomes that increase newborn's risk of death are premature birth (i.e., infants born less than 37 weeks gestation), low birthweight (i.e., infants born less than 2,500 grams), which includes very low birthweight (i.e., infants born less than 1,500 grams) and congenital defects . Infants born prematurely, low birthweight and/or with congenital defects are at increased risk of death because of undeveloped or poorly developed organs and/or organ systems and the inability to physiologically respond to their external environment. High-risk neonates (i.e., infants less than 1 month of age) also require high quality perinatal-neonatal health care and without supportive medical intervention are at increased risk of mortality.
The purposes of this study are to (a) visualize and explore the spatial patterns of low birthweight and infant mortality in the State of Michigan using automated zone matching (AZM) tool  for automated zone design, and (b) to evaluate AZM as a potential tool for public health surveillance. AZM contains a computationally intensive algorithm that recombines geographic units from a large spatial dataset into a smaller set of output zones from within which to calculate stable rates and ratios. This recombination is an iterative process, by which one geographic unit is randomly selected and user defined attribute constraint parameter(s) are evaluated (e.g., target population (TP)). If this parameter is not met AZM will search contiguous units until it is achieved, thereafter, aggregating the data and dissolving internal boundaries to create a new zone. This "initial random aggregation" (IRA) will serve as the basis for subsequent iterative decisions. The TP parameter is evaluated by summing the sum of squared differences between the target and actual population counts. The zone design with the smallest TP value is generally considered optimal. Two additional parameters that aim to maximize homogeneity within zones and heterogeneity between zones include shape and homogeneity constraints. The shape constraint minimizes the shape of zones (i.e., maximizes shape compactness) and is calculated using a simple shape statistic (perimeter2/area) (P2A). The zone design with the lowest overall P2A value, derived from summarizing each of the local P2A values is considered optimal. Homogeneity constraints maximize the similarity of user-defined characteristics of the target population and/or features of the local environment in the aggregation process. The degree of similarity is evaluated using the intra-area correlation (IAC) coefficient. The zone design with the strongest overall IAC coefficient, derived from summarizing all of the local IAC coefficients is considered optimal. A more in-depth discussion about the functionality of AZM and parameter estimation for optimal zone design is provided in the background and methods sections of this paper.
In this study we utilize AZM as a surveillance tool to create stable rate maps of low birthweight incidence (hereafter, referred to as low birthweight rates) and infant mortality and standardized incidence ratios (SIRs) and standardized mortality ratios (SMRs) constructed at the Zip Code level. Understanding the spatial patterns of low birthweight and infant mortality are important for generating mechanistic hypotheses and targeting high-risk areas for public health intervention and prevention programs and health care service needs. The National Vital Statistics Reports (NVSR)  and Center for Disease Control and Prevention (CDC)  define stable rates as those with at least 20 cases in the numerator, which corresponds to a relative standard error (RSE) of 22%, respectively, and a population denominator of at least 50 (unweighted) . The RSE is the standard error as a percent of the rate itself. For example, a RSE of 25% means that the standard error is one-quarter the size of the rate. In this analysis we use 25 as the target number of cases and 20 as the minimum case threshold. We use a target number of cases instead of a target population in the estimation of stable rates because population-based thresholds alone can mask low case counts resulting in artificially low rates with large relative standard errors. In addition, a simple shape constraint (P2A) is applied to encourage the creation of compact zones. Compact zones are important because they will minimize potential confounding in future mechanistic studies. Relatedly, we use maternal race (i.e., African American versus all other racial and ethnic groups) as our homogeneity constraint in order maximize maternal and infant homogeneity by race within zones and heterogeneity between zones. These constraints will be used to capture spatial-racial disparities in low birthweight and infant mortality in Michigan.
Following the recombination of Zip Codes into an optimal zone design the low birthweight and infant mortality rates are calculated using the sum of the number of low birthweight cases or infant deaths in the numerator and all live births in the denominator. The number of births within each new zone is also used to calculate SIRs and SMRs using the indirect method of standardization. Tables of the rates and ratios calculated by zone are then joined to the new zone geography and thematic maps are created to visualize and explore their spatial patterns. In this study, we explore the sensitivity of the rates and ratios to different zoning systems by comparing the P2A, IAC and TP results from 50 random restarts on the same data. We also validate the spatial patterns derived from optimal zone design by comparing the rates and ratios with spatial clusters detected in SaTScan using a Poisson-based modeling approach.
In Michigan there are 15 metropolitan areas located in the approximate lower third of the State. The upper two-third of Michigan is relatively rural with the population density in the Upper Peninsula about 19 persons per square mile . Low birthweight and infant mortality rates calculated at the Zip Code level for upper Michigan, particularly in the Upper Peninsula, are therefore likely to be unstable. In addition, Michigan has a high level of racial residential segregation with African Americans living primarily in metropolitan principal cities (1) and the city of Detroit where low birthweight and infant mortality rates also calculated at the Zip Code level are likely to be stable. These spatial differentials in rate stability and instability in low birthweight and infant mortality by race could lead to spurious interpretations of race-specific risk and place-based risk factors, which could result in misguided hypothetical mechanisms underlying racial disparities in these outcomes. This research using AZM to stabilize the low birthweight and infant mortality rates and ratios in Michigan is therefore warranted.
(1) Principal cities are defined as cities, villages or Census Designated Places that meet criteria involving number of people and relative number of in-commuters and out-commuters. This term replaces the term "central city." In Michigan, 2003 there were 31 principal cities in metropolitan areas (population > 50,000) and 16 principal cities in micropolitan areas (population > 10,000 and < 50,000) (Library of Michigan, ).
There are a small but growing number of studies utilizing zone design methodologies to evaluate the spatial patterns of health outcomes and local risk factors. Cockings and Martin  conducted seminal research using AZM to measure the correlation between self-reported long-term illness (LLTI) and area-level deprivation at the enumeration district (ED) (n = 1,970) and wards (n = 177) scales in Avon, a former county in the United Kingdom. This analysis was conducted at two scales to assess potential bias associated with the modifiable areal unit problem (MAUP)  scale effect. LLTI was measured using standardized morbidity ratios derived from an indirect method of standardization (i.e., estimating the expected number of cases by multiplying age-specific population counts by country-wide age-specific LLTI rates and dividing the expected by the observed number of cases). Area-level deprivation was measured using the Townsend score (i.e., the proportion of people without a car, households in overcrowded accommodations, households not owner-occupied and unemployment). Pearson correlation coefficients were estimated. The AZM parameters included a target population constraint of 250 people, aged 0–64 years, or a minimum population threshold constraint of 90% the target value. AZM runs were repeated with increasing target population counts in 250 increments up to 4,500 people to cover the population range in EDs and wards (total 13 runs). The shape of output zones were defined by a simple shape statistic (P2A). No homogeneity constraints were used in this analysis. The authors implemented 50 random restarts for each set of population and shape constraints and within each zone standardized morbidity ratios and area level deprivation scores were calculated and correlation coefficients were estimated. The results from these analyses were also compared with LLTI-deprivation correlation coefficients estimated using the ED and ward boundaries. This study found that with increasing scale (i.e., increasing population thresholds) the LLTI-deprivation correlation coefficients also increased, which was a finding similar to previous research . Optimal zone design had an LLTI-deprivation correlation coefficient of 0.88 at a target population of 3,750 people (mean population size 4,291 people), which was most closely related to the ward scale of analysis (mean population size 4,364). The LLTI-deprivation correlation coefficient at the ED scale was 0.72 and the ward scale 0.86. While the coefficients results for AZM zones and wards were fairly similar (0.88 versus 0.86) the population within wards were extremely variable (e.g., ward range, 14,290 people compared to AZM zone range, 3,000 people), with large differences between rural and urban areas. The authors conclude that the population stability derived from AZM resulted in more reliable LLTI-deprivation correlation coefficients than those based on ward boundaries, demonstrating the importance of AZM in this analysis.
A study by Haynes et al  used A2Z software [10, 11] to explore the similarity between neighborhoods derived from zone design and those relayed subjectively by local government officers and planners in the city of Bristol, United Kingdom. Seven zone designs were constructed from 814 EDs using the homogeneity constraint "material deprivation" defined by the Townsend score and other different parameters. Specifically, zone designs 1–3 used increasingly strong shape constraints. The weakest shape constraint "prevented long linear strings of areas from being joined by checking the graphic network structure for each zone." The medium shape constraint minimized local spatial dispersion after five "uncontrolled" iterations . The strongest shape constraint used the P2A parameter in the initial random aggregation. The fourth zone design used a shape constraint that enforced the separation of EDs across railways and major roads by giving pairs of EDs on either side of major routes an adjacency value of zero versus one for non-contiguous EDs. The fifth zone design aligned zone boundaries along wards using similar mean Townsend scores. The sixth zone design used a weak shape constraint as in zone design one, but maximized the homogeneity of housing type (i.e., proportion of households detached, semi-detached and terraced houses, purpose-built flats and converted flats). The seventh zone design was similar to the sixth but also tried to align zone boundaries with ward boundaries. The evaluation of optimal zone design was that with the strongest IAC and zone shapes and sizes similar to that of EDs. The EDs were considered the standard from which to measure the similarity between zones and EDs using a "similarity index" with 1 representing identical boundaries and zero representing no common boundaries. This study found that AZM zone design one was much less compact in shape than EDs and had only moderate "similarity." Zone designs 2–5 were progressively more similar to EDs and more compact in shape, but increasingly less homogeneous. Zone designs 6 and 7 preformed the "best" because of their strong IAC in terms of semi-detached housing and common boundaries. The authors concluded that in the construction of optimal zone designs there needs to be balance between the use of neighborhood homogeneity constraints and constraints relating to zone shape and boundary alignment.
Flowerdew et al  used automated zone design (AZTool system formerly AZM)  to explore the MAUP  scale and zone effects in the district of Swindon, England using population data and data on LLTI and demographic-social factors in EDs (n = 1,268) and wards (n = 148). The parameters were a target population of 8,136 people, corresponding to the average population of existing wards, and a minimum population threshold of 2,651, corresponding to the population size of the smallest ward. The zone design was based on six criteria: contiguous zones, no zone should be entirely surrounded by another, the number, shape and size of zones should be similar to the number, shape and size of wards and zones should have strong internal homogeneity. The optimal zone design was considered in order of priority the TP, P2A, and IAC values. The scale effect was evaluated by measuring the difference in population homogeneity (i.e., the IAC of LLTI-demographic-social factors) in zones compared to EDs and wards. This analysis showed that the IAC differed for different demographic-social factors at these different scales. For example, at the ED scale the percentage of people of pensionable age was most strongly correlated with LLTI, whereas, at the ward scale the percentage of male unemployment was most strongly correlated with LLTI, controlling for other risk factors. The MAUP zone effect was also evaluated using the IAC in addition to multiple regression to estimate the effect of known and suspected risk factors on the percent of population with LLTI. This analysis showed that the correlation between LLTI and various demographic-social variables varied for different zone designs; however, the direction of the estimate remained the same for most but not all variables, demonstrating a zone effect. In summary, the authors found that zone designs that emphasized the TP and P2A constraints were most optimal and there seemed to be little benefit from the use of homogeneity constraints.
Stafford et al  used zone design methods (ZDES 3b software)  to create neighborhoods in the London boroughs of Camden and Islington based on socioeconomic homogeneity constraints, defined as the proportion of residents living in rented social housing and physical boundaries based on roads and railways to explore the effect of these neighborhood characteristics on the health outcomes body mass index, alcohol intake, exercise, smoking behavior and self-rated general health, controlling for individual level characteristics. The two zone designs were similar in number to the real census wards (n = 34). Two-level hierarchical models were implemented to estimate the variation in health outcomes within and between zones in each zone design. Similar models were implemented in wards for comparison purposes. Individual-level characteristics were modeled at level-1 and the intercept and slope coefficients were modeled at level-2. The results showed positive and significant associations for alcohol intake, walking, smoking and self-rated health using all three boundary definitions. The strongest association was between alcohol intake and housing tenure using the boundaries defined with the homogeneity constraint housing tenure. However, the magnitude of the between-zone variation was small in comparison to the within-variation (i.e., individual-level variation) for all boundary definitions suggesting that the two zone designs had no substantial advantage over the ward boundaries. In contrast, a study conducted by Riva et al  showed that census tract administrative boundaries were limited in their ability to measure active living potential (i.e., environments that were conducive for walking). Through the use of zone design, using the homogeneity constraints population density, land use mix and accessibility to services, the authors identified seven types of environments within which, varying levels of active living were possible.
These aforementioned studies compliment two existing but separate bodies of health research that could be affected by MAUP scale and/or zoning effects. The first body of research applies geographic methodologies to correct for rate instability and/or preserve case confidentiality through data suppression . The most commonly used methods to address these concerns are probability mapping , spatial filtering [18, 19], Bayesian smoothing , and cluster detection methods such as SaTScan [21, 22]. The second body of research evaluates neighborhood effects on health outcomes [23–32] in which neighborhood risk characteristics are explored. This study complements these two existing bodies of research by providing an empirical example how AZM can be used as a surveillance tool to define the spatial patterns of low birthweight and infant mortality in the State of Michigan. From these spatial patterns meaning may be derived about their underlying processes (e.g., underlying mechanisms), including individual and local environmental risk factors and access to health care services.
The methodologies to calculate RSEs and confidence intervals to validate the stability of the rates and ratios will be based on the underlying distributions of births and deaths. The NVSR  reports that vital statistics birth and linked infant death records "include a complete case count because more than 99% of all births and deaths in the United States are registered." "These data are not subject to sampling error but they may be affected by nonsampling error in the registration process, particularly involving missing case information such as Zip Code of residence" . The number of births and deaths that actually occur are thought of as "one outcome in a series of possible outcomes under the same (or similar) circumstances" and the numbers of births and deaths are therefore, subject to random variation. When the numbers of births or deaths is large the distribution is assumed to follow a normal distribution and when the number of births is small (i.e., less than 100) the distribution is assumed to follow a Poisson distribution. The RSE of normally distributed data will be small, while the RSE of Poisson distributed data is likely to be large. This study will utilize the methodologies provided by the NVSR [3, 33] to calculate the RSEs and confidence intervals for rates and ratios based on these two distributions. RSEs and confidence intervals will be the mechanisms by which the stability of rates and ratios derived from AZM zone definitions are evaluated.
Descriptive statistics of low birthweight and infant deaths, Michigan 2004–2006.
(n = 370587)
(n = 23761)
(n = 2972)
Less than High School
Automated Zoning Methodology
Following AZM the number of Zip Codes was reduced from 896 to 375 zones for low birthweight and 98 zones for infant deaths. The low birthweight rates and SIRs and infant mortality rates and SMRs were calculated within the optimal zone designs for these outcomes, each zone having at least 20 cases/deaths and 50 births; therefore, meeting case  and population  thresholds for stable rates and ratios. In this study the optimal zone design was determined by ranking the P2A and IAC output results and then comparing those two zone designs that ranked highest (i.e., had the smallest P2A and strongest IAC). We also compared the IRA with these two "best" zone designs. Since the IRA zone design was very similar to the two "best" zone designs we decided to use the IRA to calculate the rates and ratios. When AZM is run on multiple occasions using the same data, initial settings and number of iterations the same IRA zone configuration will result. Thus for public health surveillance purposes the IRA would be the optimal zone design because it has stable geography from within which to monitor and evaluate rates and ratios over time.
AZM Zones with the highest low birthweight rates and standardized incidence ratios, Michigan 2004-2006.
Zones with the highest infant mortality rates and standardized mortality ratios, Michigan, 2004–2006.
SaTScan clusters of low birthweight, Michigan 2002–2004.
SaTScan clusters of infant deaths, Michigan 2002–2004.
Visualizing and exploring the spatial patterns of low birthweight and infant mortality rates in Michigan was an important first step in conducting a geographic evaluation of the State's reported high infant mortality rates. Using AZM we were able to calculate stable rates and ratios that would not have been possible using commonly used Zip Code boundaries. The spatial patterns derived from AZM showed areas throughout the state with elevated low birthweight rates. Elevated SIRs were primarily located in the principal metropolitan cities of Benton Harbor, Detroit, Flint, Kalamazoo, Saginaw, Inkster, River Rouge, Encorse, Pontiac, Southfield, Mount Clemens, Grand Rapids, Benton Harbor and Ypsilanti. In these cities African Americans comprise a large proportion of the population therefore these patterns also reflect spatial disparities in low birthweight by race in Michigan. These findings are consistent with previous research reporting high rates of adverse birth outcomes in racially segregated areas [28, 31]. Importantly, there are other principal metropolitan cities that also have large African American populations that did not show significantly high rates of low birthweight or SIRs such as Battle Creek, Jackson and Lansing suggestive that race alone cannot completely explain the racial disparities in low birthweight observed in Michigan. Likewise, there are spatial disparities in infant mortality with Detroit, Flint and Saginaw having the highest death rates and other principal metropolitan cities that are also highly segregated by race but do not have significantly high infant death rates. Thus, in addition to studying spatial disparities by race in low birthweight and infant mortality, future research in Michigan should also investigate spatial differences within the African American population and potential mechanisms underlying their environments, including access to health care services.
The size of zones within the optimal zone designs is largely dependent upon the number of low birthweight births or infant deaths in Zip Codes. Low birthweight is not as rare an event as infant mortality; therefore, the size of zones created to meet the TP or minimum case threshold for low birthweight are smaller or more compact than zones created for infant deaths. In future studies these smaller zones may be useful for generating mechanistic hypotheses. For example, in this study we used maternal race as our homogeneity constraints, thus the zones in which low birthweight rates are calculated may be relatively homogeneous on this maternal characteristic in addition to being compact. Future studies of low birthweight that use environmental constraints may identify common risk factors associated with certain living environments. Previous studies [7, 12] have shown the usefulness of AZM methodology to identify neighborhood social and built environmental risk factors for poor health outcomes as well as protective characteristics that lead to healthy living .
For infant mortality however, the size of the zones created to meet the TP or minimum case threshold for infant deaths were relatively large making it difficult to explore underlying mechanisms. Instead these zones may be better used to understand the demand for maternal and infant health care services, including neonatal-perinatal health care. Adding health care services as a homogeneity constraint will create zones that are homogeneous within (e.g., zones with high versus low utilization of services) and heterogeneous between on this constraint. Overlaying the health care services onto these zones may be useful in identifying areas that have services but lack utilization versus areas that do not have services but are in need. In future research we will also assess the need for health services within administrative units such as counties in case mother's who receive services in one county are unlikely to cross over into other counties. This research could utilize the capability of AZM to analyze multiple boundary definitions (e.g., Zip Codes and county boundaries) that were not investigated in this study. In Michigan the perinatal regionalization system was abolished in the mid-1980s due to budgetary constraints. There is an immediate need to evaluate the availability and accessibility of maternal and infant health care services for high-risk mothers and infants, including transport to neonatal intensive care units. This empirical example of AZM as a tool for public health surveillance could also be applied in the area of health services research.
In Figure 6 the IRA setting is shown as the first pair of points (i.e., P2A and IAC) on the far left-hand side of the graph. When AZM is run on multiple occasions using the same data, the same initial settings and the number of iterations, the same IRA will result in the same configuration of output zones. The subsequent 50 runs following the IRA resulted in slightly different sets of zone designs. We found two of these zone designs to be slightly more optimal than the IRA as shown in Figure 6 (runs 22 and 29); however, a comparison of the spatial patterns of SIRs for the three zone designs show all of the same significant areas in all thee maps. This sensitivity analysis demonstrates the stability of the spatial patterns (i.e., minimal MAUP zone effect) derived from AZM over time. There were virtually no differences except for one zone in the eastern portion of Saginaw. Therefore, we decided that for surveillance purposes the IRA run would be the ideal zone design to use because of its stable geography and the ability to monitor and evaluate spatial patterns of rates or ratios over time.
Finally, we conducted a comparative analysis of the output generated in AZM and SaTScan. The performance of AZM was very similar to that of SaTScan showing similar areas of elevated risk in SIRs and significant clusters. The AZM analysis identified additional areas with elevated SIRs that were not detected by SaTScan. We believe these differences were primarily due to the differences in modeling techniques. For example, AZM accounted for the homogeneity constraint "race" in the construction of zones thereby, forcing the clustering of women on this characteristic. Since the forced clustering was of women and not infants per se further investigation on the importance of this clustering on the low birthweight and infant mortality rates and ratios is warranted. SaTScan on the other hand has the ability to control for these individual-level risk factors (i.e., since we know that African American women are at increased risk of having a low birthweight infant SaTScan can remove the effect of race to detect clusters of women/infants in areas that are unknown). In this analysis we did not choose to remove the effect of race in SaTScan because it was used as an important constraint in AZM.
One limitation of our study is that while we tested the MAUP zoning effect on our rates and ratios we did not test the MAUP scale effect using different levels of administrative units. Future research should also apply AZM at the block group and census tract levels to test for the scale effect. Understanding the magnitude of the MAUP zoning and scale effects is important in validating the stability of the spatial patterns of rates and ratios. Since the focus of this study was to visualize and explore the spatial patterns of low birthweight and infant mortality we did not test mechanistic hypotheses with multilevel models. Multilevel models can be used to estimate the effect of individual characteristics within zones and area-level characteristics between zones. This analysis would be important since Stafford et al  found little differences between administrative and zoning boundaries in their health study. Relatedly, individual-level birth and infant death records were grouped to calculate rates and ratios and ecological fallacy occurs when analyses based on grouped data lead to conclusions different from those based on individuals. Therefore, the spatial patterns derived from this analysis may include aggregation bias due to the differential distribution of confounding variables created by grouping . Using multilevel models in future research will help to decompose these rates and ratios by studying individuals nested within their environments. The sensitivity of zone designs on multilevel modeling results is an area for future research. Another limitation in our study was that some zones, especially for deaths were very large and if the purpose of future studies is to explore underlying risk factors for infant deaths in small areas then a Bayesian method would be preferred. Another limitation is that we used low birthweight as an outcome measure but future research should also study small for gestational age (SGA). SGA is defined as a birth weight less than the 10th percentile for a given gestational age based on a reference population of all singleton live births in the United States  and may be conceptually helpful in studying adverse birth outcomes in diverse populations. Finally, the Zip Code of mother's residence was used to map these spatial patterns of low birthweight and infant deaths but since a travel history was not available future research should also consider the movement of women and infants and their differential exposures during pregnancy up to death.
This study showed that spatial patterns of low birthweight and infant mortality derived from using administrative boundaries such as Zip Codes can result in numerous errors across the state when the number of cases/deaths per unit is small. To correct this problem, researchers have used geographic methods such as probability mapping, spatial filters, also referred to as spatial smoothers, Bayesian smoothing techniques and cluster detection methods. This study demonstrates another technique called AZM tool for automated zone design to stabilize rates and ratios. AZM inputs boundaries but reconstructs the geography in areas with low case counts in order to increase case numbers to stabilize rates and ratios for thematic mapping. Thematic maps are a useful tool to visualize and explore spatial patterns because they are relatively easy to read and interpret from a research and laymen's (i.e., public) perspective. AZM is therefore, a valuable tool that could be used for geo-surveillance in public health departments, including the reporting of disease and health conditions to the public. From these spatial patterns specific zones of high risk can be identified from within which to study underlying mechanisms, target for monitoring and implementing public health intervention and prevention programs and evaluating the need for health care services.
Researcher's who wish to proceed with AZM however, need to be aware that the protocol to evaluate optimal zone design is largely subjective and dependent upon the study question, conceptual design of the study and input modeling parameters. Future research should continue to document conceptual frameworks of health studies in addition to the input modeling parameters to evaluate optimal zone design for different health outcomes.
This study used vital statistics live birth and linked infant death records for the State of Michigan obtained from the Michigan Department of Community Health, Office of Vital Statistics and aggregated for the years 2004 to 2006 (n = 384,587). Only live singleton births were studied thus reducing the dataset to 370,588 births. The birth records were queried for low birthweight defined as infants born weighting less than 2,500 grams. Those birth records with birth weight less than 200 grams or greater than 6,000 grams were considered outliers and removed from the dataset (n = 211). There were 2,972 infant deaths during this time period, excluding those infant deaths that occurred in 2004 to infants born in 2003 and those infants who died in 2007 but were born in 2006. Infant deaths were defined as those infants born alive that died within their first year of life. The infant death records were linked to the live birth files using the birth certificate as the common identifier.
The Zip Code of mother's residence at the time of her infant's birth was the geographic unit of analysis. There were 201 birth records without Zip Codes that were removed from the dataset, further reducing the data to n = 370,386 records. The low birthweight, infant death and all birth records were aggregated by Zip Code and joined to a Zip Code boundary file produced by Environmental Systems Research Institute 2002  (n = 896). Those Zip Codes whose boundary changed after 2002 were recoded to compliment the ESRI geographic Zip Code file. The recoding of Zip Codes was validated with the geocoded address of the mother and did not affect the original location-geographic position of the birth.
Since one of the purposes of our research was to evaluate AZM as a potential tool for public health surveillance we decided to utilize Zip Codes as the geographic unit of analysis because approximately 99% of birth records contained a Zip Code making these data readily available/timely for spatial analysis. Using the Zip Code as the unit of analysis also reduced potential bias associated with missing data. In this study, approximately 3% of records could not be geocoded (i.e., address matched) because the mother's address was given as a post office box number or a rural-route address. This potential bias would have been located largely in Upper Michigan. We recognize that Zip Codes may not reflect the local neighborhood environment in which mothers live and infants are born and that a finer resolution of geography is preferred to explore local hazards and potential exposures. To visualize and explore spatial patterns and to generate mechanistic hypotheses at a finer geographic scale the birth and linked infant death records would need to be geocoded and aggregated to the census block, block group or census tract. The authors decided that from a surveillance perspective, the Zip Code level of analysis would be conduced first while the records are being geocoded and thereafter, surveillance at a finer geographic scale could follow.
Automated Zoning Methodology
Automated zoning was implemented using the Automated Zone Matching (AZM) 1.0.0 software written by David Martin . This software is freeware and available for public use on David Martin's website http://www2.geog.soton.ac.uk/users/martindj/davehome/software.htm. This software incorporates the principles of automated zone design originally conceptualized by Oppenshaw . In preparation for AZM analysis the ESRI Zip Code boundary file  was converted to an ArcInfo coverage and island polygons were removed and slivers and overshoots were removed and undershoots were corrected. After the topology was check and corrected the arc and polygon attribute information was exported for use in the AZM software. The arc files were uploaded in AZM as intersection and contiguity files. These files comprised the Zip Code geography used in subsequent analyses.
The first parameter selected was the population target (PT) and/or minimum population threshold constraint(s). As noted previously, we used 25 low birthweight cases or infant deaths as our ideal target and 20 cases/deaths as our minimum threshold. Thus, all new zones created had at least 20 cases/deaths from which to calculate stable rates (i.e., rates with RSE 20%, respectively). AZM functions by minimizing the squared difference between the target number of cases/deaths and the number of cases/deaths in each Zip Code . Thus if every Zip Code contained exactly 25 cases/deaths this constraint would reach zero. The minimum population threshold was 50 births. This threshold was not specified in AZM but resulted after the aggregation of low birthweight cases/deaths. These TP parameters were held constant throughout the analysis.
The second parameter selected was the shape statistic defined as:
Where qk is the perimeter of zone k and Ak is its area. AZM functions to minimize the perimeter squared divided by area, which maximizes shape compactness on zones. As outlined in Martin  software documentation "irregular shapes may have longer perimeters in relation to their area; thus, squaring the perimeter makes highly irregular shapes less attractive when this constraint is in operation."
The third parameter selected was the homogeneity constraint. The homogeneity constraint promotes homogeneity within zones and heterogeneity between zones by encouraging the aggregation of similar values. In this study we used mother's race (i.e., African Americans versus all others) as our homogeneity constraint. Maternal race was added because of the high levels of racial residential segregation in Michigan's cities and the desire to capture spatial-racial disparities in low birthweight and infant mortality. The IAC for these two racial groups was obtained for each k category and overall K across all categories (k1, k2 = K categories). An IAC of 0.5 implied a reasonable degree of homogeneity .
Where, was the mean case/death size of the M number of Zip Codes, with an adjustment  to take into account variation in the case/death size of units. It was expected that would be very close to . N g was the case/death size of Zip Code g; P k was the overall proportion of cases/deaths in category k, and P kg was the proportion in category k in Zip Code g. Thus, the IAC was approximately the ratio of the Zip Code variance to the maternal level variance, and this ratio was divided by the mean case/death size.
An optional parameter that may be changed for user preferences was the "random number initialization value," which sets the seed value for the pseudorandom number generator prior to the IRA run. Keeping this value constant during multiple restarts of AZM would result in the same zone designs. Changing this seed value would result in a different sequence of pseudorandom decisions, which would alter the zone designs. For the purposes of this surveillance research we kept this parameter constant to eliminate differences between zone designs. With these model parameters the AZM analysis was conducted by running 50 program restarts with 100 iterations each taking the run (i.e., zone design) with the most compact shape, the strongest IAC and lastly the best TP statistic.
Following this reconstruction of zones the rates and ratios were calculated for each zone using SAS 9.03 software . The incidence rates were calculated as the number of low birthweight births divided by the total number of births * 100. The relative standard errors (RSE) and 95% confidence intervals (95% CI) for the incidence rates were calculated using the methodologies [3, 4] described below.
A Poisson distribution was used to estimate the 95% CI for the low birthweight incidence rates when the number of low birthweight cases was less than 100:
Lower limit = rate × L
Upper limit = rate × U
Where L(0.95, rate)and U(0.95, rate)are values that correspond to the number of births provided in NVSR .
A Poisson distribution was used to estimate the 95% CI for the infant mortality rates when the number of births was less than 100:
Lower limit = IMR i × L(0.95, D adj )
Upper limit = IMR i × U(0.95, D adj )
L(0.95, D adj ) and U(0.95, D adj )refer to limits provided in NVSR .
where: b i = number of low birthweight births in Michigan for the years 2004–2006
B i = number of live births in Michigan for the years 2004–2006
Zone i = number of live births in a zone
Ninety-five percent confidence intervals were also calculated for the standardized incidence and mortality rates.
Following the calculation of the rates and ratios by zone these data were rejoined to the zone geography, input into ArcGIS 9.2  and thematic maps were created. The spatial patterns derived by AZM were validated by comparing the location of zones with high SIRs and SMRs with statistically significant spatial clusters of low birthweight or infant mortality derived in SaTScan. SaTScan is cluster detection software developed by Dr. Martin Kulldorff. It is freeware and publically available at the website: http://www.satscan.org/. SaTScan has been used in many health studies as a surveillance tool to explore clusters of disease in space, time and space-time . In this study we detected only spatial clusters by scanning a circular/ellipse window across the centroids of all Zip Codes noting the number of observed and expected low birthweight cases or infant deaths inside the window at each location. We used a Poisson-based model, where the number of low birthweight cases or infant deaths in each Zip Code was assumed to be Poisson distributed, according to the underlying births at risk. The expected number of cases in each Zip Code was calculated as:
E[c] = p* C/P
Where c was the observed number of cases and p was the number of births in the Zip Code of interest, while C and P were the total number of cases and births respectively. A relative risk was derived by dividing the observed number of cases by the expected number of cases.
Where C was the total number of cases, c was the observed number of cases within the window and E [c] was the adjusted expected number of cases within the window under the null-hypothesis. C-E [c] was the expected number of cases outside the window. I () was an indicator function. When SaTScan scans for clusters with high rates, I () was equal to 1 when the window had more cases than expected under the null-hypothesis, and 0 otherwise . Hypothesis testing was conducted using 999 Monte Carlo simulations. Finally a test statistic was calculated for each random replication as well as for the real dataset and if the latter was among the 5% highest then the test was significant at the 0.05 level. In this study we scanned for clusters of geographic sizes that would capture between zero and 6.4% of births at risk for low birthweight, which represented the statewide rate of low birthweight. We also scanned for clusters of geographic sizes that would capture between zero and 7.6% of births representing the statewide infant mortality rate. There was no geographic overlap in clusters.
This project was supported by the Intramural Research Grant's Program, Office of the Vice President for Research and Graduate Studies, Michigan State University. We would also like to thank the Michigan Department of Community Health, Office of Vital Statistics for providing the birth and linked infant death records; and David Martin and Samantha Cockings, School of Geography, University of Southampton for providing the freeware AZM software. We are also appreciative of the comments received from the anonymous reviewers in the preparation of this manuscript.
- Michigan Department of Community Health. http://www.michigan.gov/mdch/0,1607,7-132-2944_4669---,00.html
- Martin D, Nolan A, Trammer M: The application of zone-design methodology in the 2001 UK Census. Environment and Planning A. 2001, 33: 1949-1962. 10.1068/a3497.View ArticleGoogle Scholar
- National Vital Statistics Reports: Births: Final Data for 2005. Centers for Disease Control and Prevention. 2007, 56 (6): 98-103.Google Scholar
- Healthy People 2010 Statistical Note: Healthy people 2010 criteria for data suppression. 2002, Centers for Disease Control and Prevention, Department of Health and Human Services, National Center for Health StatisticsGoogle Scholar
- US Census Bureau: Source and accuracy of the data for the March 2001 current population survey microdata file, 2001. National Vital Statistics Reports. 2002, 56 (6): 98-http://www.bls.census.gov/cps/ads/2001/ssrcacc.htmGoogle Scholar
- Library of Michigan, Department of History, Arts, and Libraries: Metropolitan and Micropolitan Statistical Areas in Michigan based on the 2000 Census. 2003Google Scholar
- Cockings S, Martin D: Zone design for environment and health studies using pre-aggregated data. Social Science and Medicine. 2005, 60: 2729-2742. 10.1016/j.socscimed.2004.11.005.PubMedView ArticleGoogle Scholar
- Openshaw S: The Modifiable Areal Unit Problem. CATMOG 38. 1984, Norwich: Geo BooksGoogle Scholar
- Haynes R, Daras K, Reading R, Jones A: Modifiable neighbourhood units, zone design and residents' perceptions. Health and Place. 2007, 13: 812-825. 10.1016/j.healthplace.2007.01.002.PubMedView ArticleGoogle Scholar
- Daras K: An information statistics approach to zone design in the geography of health outcomes and provision. Ph.D. Thesis. Modifiable neighbourhood units, zone design and residents' perceptions. Health and Place. Edited by: Haynes R, Daras K, Reading R, Jones A. 2007, University of Newcastle, UK, 13: 812-825.Google Scholar
- Openshaw S, Rao L: Algorithms for reengineering 1991 census geography. Enviornment and Planning A. 1995, 27: 425-446. 10.1068/a270425.View ArticleGoogle Scholar
- Flowerdew R, Manley DJ, Sabel CE: Neighbourhood effects on health: Does it matter where you draw the boundaries?. Social Science and Medicine. 2008, 66: 1241-1255. 10.1016/j.socscimed.2007.11.042.PubMedView ArticleGoogle Scholar
- Martin D: Developing the automated zoning procedure to reconcile incompatible zoning systems. International Journal of Geographical Information Science. 2003, 17 (2): 181-196. 10.1080/713811750.View ArticleGoogle Scholar
- Stafford M, Duke-Williams O, Shelton N: Small area inequalities in health: Are we underestimating them?. Social Science and Medicine. 2008, 67: 891-899. 10.1016/j.socscimed.2008.05.028.PubMedView ArticleGoogle Scholar
- Alvanides S, Openshaw S, Rees P: Designing your own geographies. The Census Data System. Edited by: Rees P, Martin D, Williamson P. 2002, Chichester: Wiley, 47-65.Google Scholar
- Riva M, Apparicio P, Gauvin L, Brodeur J: Establishing the soundness of administrative spatial units for operationalising the active living potential of residential environments: An exemplar for designing optimal zones. International Journal of Health Geographics. 2008, 7: 43-10.1186/1476-072X-7-43.PubMedPubMed CentralView ArticleGoogle Scholar
- Waller LA, Gotway CA: Applied spatial statistics for public health data. 2004, New Jersey: John Wiley & SonsView ArticleGoogle Scholar
- Rushton G: Public health, GIS and spatial analytic tools. Annual Review of Public Health. 2003, 24: 43-56.View ArticleGoogle Scholar
- Talbot TO, Kulldorff M, Forand SP, Haley VB: Evaluation of spatial filters to create smoothed maps of health data. Statistics in Medicine. 2000, 19: 2399-2408. 10.1002/1097-0258(20000915/30)19:17/18<2399::AID-SIM577>3.0.CO;2-R.PubMedView ArticleGoogle Scholar
- Johnson GD: Small area mapping of prostrate cancer incidence in New York State (USA) using fully Bayesian hierarchical modeling. International Journal of Health Geographics. 2004, 3: 29-10.1186/1476-072X-3-29.PubMedPubMed CentralView ArticleGoogle Scholar
- Forand SP, Talbot TO, Druschel C, Cross PK: Data quality and the spatial analysis of disease rates: Congenital malformations in New York State. Health and Place. 2002, 8: 191-199. 10.1016/S1353-8292(01)00037-5.PubMedView ArticleGoogle Scholar
- Ozdenerol E, Williams BL, Kang SY, Magsumbol MS: Comparison of spatial scan statistic and spatial filtering in estimating low birth weight clusters. International Journal of Health Geographics. 2005, 4: 19-10.1186/1476-072X-4-19.PubMedPubMed CentralView ArticleGoogle Scholar
- Boyle MH, Willms JD: Place effects for areas defined by administrative boundaries. American Journal of Epidemiology. 1999, 149 (6): 577-585.PubMedView ArticleGoogle Scholar
- Glaster G: On the nature of neighbourhood. Urban Studies. 2001, 38 (2): 2111-2124. 10.1080/00420980120087072.View ArticleGoogle Scholar
- Picket K, Pearl M: Multilevel analyses of neighbourhood socioeconomic context and health outcomes: a critical review. Journal of Epidemiology and Community Health. 2001, 55 (2): 111-122. 10.1136/jech.55.2.111.View ArticleGoogle Scholar
- Diez Roux AV: Investigating neighborhood and area effects on health. American Journal of Public Health. 2001, 91 (11): 783-1789. 10.2105/AJPH.91.11.1783.View ArticleGoogle Scholar
- Macintyre S, Ellaway A, Cummins S: Place effects on health: How can we conceptualise, operationalise and measure them?. Social Science and Medicine. 2002, 55 (1): 125-139. 10.1016/S0277-9536(01)00214-3.PubMedView ArticleGoogle Scholar
- Kawachi I, Berkman LF: Neighborhoods and health. 2003, New York: Oxford University PressView ArticleGoogle Scholar
- Oaks JM: The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. Social Science and Medicine. 2004, 58: 1929-1952. 10.1016/j.socscimed.2003.08.004.View ArticleGoogle Scholar
- Diez Roux AV: Estimating neighborhood effects: the challenges of causal inference in a complex world. Social Science and Medicine. 2004, 58 (10): 1953-1960. 10.1016/S0277-9536(03)00414-3.PubMedView ArticleGoogle Scholar
- Grady SC: Racial disparities in low birthweight and the contribution of residential segregation: A multilevel analysis. Social Science and Medicine. 2006, 63: 3013-3029. 10.1016/j.socscimed.2006.08.017.PubMedView ArticleGoogle Scholar
- Rauh VA, Landrigan PJ, Claudio L: Housing and health; intersection of poverty and environmental exposures. Annals New York Academy of Science. 2008, 1136: 276-288. 10.1196/annals.1425.032.View ArticleGoogle Scholar
- National Vital Statistics Reports: Infant Mortality Statistics from the 2004 Period Linked Birth/Infant Death Data Set. Division of Vital Statistics. 2007, 55 (14): 29-32.Google Scholar
- Cromley EK, McLafferty SL: GIS and public health. 2002, New York: The Guilford Press, 8-9.Google Scholar
- Alexander GR, Himes JH, Kaufman RB, Mor J, Kogan M: A United States national reference for fetal growth. Obstetrics & Gynecology. 1996, 87: 163-8. 10.1016/0029-7844(95)00386-X.View ArticleGoogle Scholar
- Environmental Systems Research Institute (ESRI). http://www.esri.com/
- Martin D: David Martin’s Software Automated Zone Matching. http://www2.geog.soton.ac.uk/users/martindj/davehome/software.htm
- Openshaw S: A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modeling. Transactions of the Institute of British Geographers, New Series. 1977, 2 (4): 459-472. 10.2307/622300.View ArticleGoogle Scholar
- Tranmer M, Steel D: Using census data to investigate the causes of the ecological fallacy. Environment and Planning. 1998, A30: 817-831. 10.1068/a300817.View ArticleGoogle Scholar
- SAS Institute Inc., SAS 9.1.3. http://www.sas.com
- Kulldorff M: SaTScan.http://www.satscan.org/
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.