A brief visual primer for the mapping of mortality trend data
© James et al; licensee BioMed Central Ltd. 2004
Received: 30 January 2004
Accepted: 08 April 2004
Published: 08 April 2004
Skip to main content
© James et al; licensee BioMed Central Ltd. 2004
Received: 30 January 2004
Accepted: 08 April 2004
Published: 08 April 2004
Maps are increasingly used to visualize and analyze data, yet the spatial ramifications of data structure are rarely considered. Data are subject to transformations made throughout the research process and then used to map, visualize and conduct spatial analysis. We used mortality data to answer three research questions: Are there spatial patterns to mortality, are these patterns statistically significant, and are they persistent across time? This paper provides differential spatial patterns by implementing six data transformations: standardization, cut-points, class size, color scheme, spatial significance and temporal mapping. We use numerous maps and graphics to illustrate the iterative nature of mortality mapping, and exploit the visual nature of the International Journal of Health Geographics journal on the World Wide Web to present researchers with a series of maps.
Important and substantial differing conclusions will result from statistical analysis, based on the manner in which variables are defined, operationalized, calculated and standardized. Variation across these transformations of data must be addressed, as changes in operationalization result in different outcomes . Increasingly for social scientists, maps function as both spatial representations of data, as well as tools for exploratory spatial data analysis . To explore the spatial patterns of the ultimate health outcome, death, we calculated and mapped United States mortality rates at the county level. In one study, age-adjusted, five-year-averaged, all-cause mortality rates at the county level were calculated using data from the Compressed Mortality File . In this research  we found visual and statistical evidence of spatial clustering of relatively high and low mortality rates in several regions across the United States. This paper recounts the data transformations and resulting spatial patterns that were evident in this inquiry.
A wide range of studies has examined different determinants of health and clustering of health outcomes, either within demographic and socioeconomic classifications or spatial location [5–7]. In this paper we visually demonstrate how variations in methodology result in different spatial patterns of mortality, some striking and others less dramatic. We make full use of the visual nature of publishing on the Internet to provide examples from different methodologies with the use of maps and charts. These graphics are a product of the many steps taken throughout the research process in order to reach valid and reliable mortality data calculations. Reframing our empirical exercise as a research question, after examining variation in these results, we ask: do ecological level measures of mortality cluster differently based on how mortality is standardized, measured and operationalized? The answer is unequivocally yes and conditionally no. We explain these results more fully in this paper with an eye toward providing guidance for other investigators.
The wide variety of health atlases, categorized by population or disease, is testimony to the popularity and value of mapping health outcomes in both print form [8–10] and on the Web [11–14]. This popularity is due in large part to its effectiveness in data analysis. Beginning with Dr. John Snow's London cholera maps , health researchers have exploited the advantages of data visualization. Detecting spatial patterns in social, economic, and health variables influences research by showing where these phenomena exist, their intensity and their spatial anchoring over time. Investigators should be reminded that initial decisions regarding research approach and assumptions fundamentally affect the resulting spatial patterns. All subsequent decisions (e.g., operationalization, transformation, classification and standardization) necessarily affect the resulting spatial pattern as well.
Variable distributions and data methodology have long been issues related to mapping, as Jenks and Caspall  have outlined. According to Monmonier , "Social scientists need maps to explore and understand their data and to confirm and refine their hypotheses." These researchers have acknowledged the importance between underlying data and its relationship to the visual output it produces in a choropleth map. This relationship is further explored throughout the article using county-level mortality data.
Three research questions are derived from our empirical research project of healthy and unhealthy places in America: Are there spatial patterns to mortality, are these patterns statistically significant, and are they persistent across time? A series of mapped examples is used demonstrating the variety of geographic patterns that emerge across each methodological technique. These maps include high and low mortality standardizations, cut points (standard deviations, natural breaks, quantiles), class size and color scheme, spatial significance of high and low mortality clusters, and temporal persistence of these clusters. Summary maps and charts are provided, where necessary, to highlight the levels of change that exist across methodological techniques.
Using five-year-average age-adjusted mortality rates for 1993–1997, distinct high and low mortality clusters appear around the United States. Five-year averages are calculated to provide rate stability for counties with small populations, where a minor increase or decrease in deaths for a single year may cause a dramatic change in the county's mortality rate for that particular year .
These high and low clusters beg the question, how are high and low mortality measured, and what differences exist when rates are not standardized or are done so in another fashion? In an attempt to address these concerns, illustrations are used throughout to acknowledge the importance of methodology when conducting spatial data analysis.
The subsequent sections of this article follow the line of research in the "Healthy and Unhealthy Clusters in America"  project of this research team, detail the important differences that emerge in spatial outcomes as various methodological techniques are applied to the data and measurement of variables and classes, and outline potential impacts on research.
The way in which the dependent variable, mortality rate, is adjusted/standardized must be acknowledged when mapping or analyzing data of any sort. The importance of this acknowledgement lies in the substantial differences in spatial outcomes and analysis across a variety of adjustment techniques. Several techniques of standardization are possible. In this research crude rates, age, age-sex, and age-sex-race rate adjustments are used. Crude death rates are simply the number of deaths per 100,000 of the population. While simple to calculate, the disadvantage is that demographic differences in populations are not reflected in the rate and thus counties cannot be compared directly. Age-adjusted rates are standardized by age, which adjusts each age group of each county, or unit of analysis, to represent the proportion of the total population of that specific age group. The same is done for age and race, as well as for age, race, and sex, standardized rates. The principle behind this standardization technique is that it makes the appropriate adjustments to the county rate so that the demographic profile of the county mirrors the demographics of the entire country, thus facilitating direct comparisons. If a county has an above average minority or elderly population, it is adjusted accordingly, therefore eliminating the possibility that this particular population contributes to even higher death rates. The spatial distribution of mortality rates changes after each adjustment, or combination of adjustments, consistent with the major research question of this article.
Operationalizing the mortality variable is another necessary step in reaching the appropriate spatial outcome in a map. As is the case with standardization procedures, differences in these operationalizations lead to different outcomes, which are mapped in this section. Referring back to figure 1, 5-year average, age-adjusted mortality rates are used to assess change across cut-point procedures.
Once the dependent variable is standardized, an important aspect of mapping data that must be considered is the definition of cut-points. Each series of cut-points is based on a particular mathematical and statistical formula, therefore leading to this variation in results. There are three standard cut-points used in these mortality maps: 1) standard deviation, 2) natural breaks, and 3) quantiles. Statistically speaking, standard deviation is the positive square root of the variance, measuring a designated area above and below the mean. "Natural breaks is based on an algorithm produced by Jenks that is an optimization procedure which minimizes within class variance and maximizes between class variance in an iterative series of calculations" . In other words, it identifies natural cut-points in the data, rather than imposing classification boundaries with set widths. Quantiles are another commonly used classification method, simply placing an equal number of enumeration units into each class. For instance, in a five-class group, each class holds twenty percent (p. 670). This procedure is used in the Geographic Information System (GIS) software ArcView 3.2, which formulates the categorizations for data separation.
Quantiles (Figure 8) show much larger clusters of high and low mortality and is the most concentrated of any classification method. Far fewer counties are considered on par with the national average than is the case with figure 1. Natural breaks (Figure 9), broken into three classes, show the same general clusters, but are not quite as filled out as the quantiles. The Midwest is less concentrated, as well as parts of the high mortality clusters in the South. The standard deviation technique results in the most sparsely clustered maps (Figure 10). Figure 10 distinguishes a sparse number of counties in both high and low mortality clusters, with the Midwest being far less concentrated than in previous figures, once again the case with the Southern unhealthy clusters as well. While the patterns shown using the standard deviation method are consistent with quantiles and natural break methods, it is comprised of far fewer counties.
Analyzing spatial outcomes in mortality data across standard deviations, natural breaks, and quantiles provides interesting results. Although outcomes across each cut-point technique may not always vary dramatically, differences definitely exist. These differences are enough to distinguish among each methodology the reasons why the spatial patterns occur in a variety of ways based on how the dependent variable is measured within counties. Making comparisons across method, while focusing at the county level, greatly contributes to the researchers knowledge and understanding of the underlying data. While each of these techniques is unique, we chose to use standard deviation cut points because it is the most statistically sound estimate of the three. Using standard deviations, we test the spatial significance and temporal stability of our mortality clusters, but first we analyze classification size and color scheme of mortality.
MacEachren  acknowledges the relationship between simplicity versus complexity regarding class size in maps and recommends producing maps in the most simple form possible, as we have done in our research. This allows for an easily readable and understandable map. But, not all maps have this luxury; some are of a more complex nature showing a greater amount of information. The spatial patterns in the 5-class natural breaks map (Figure 11) do not possess solid blankets of high and low mortality, as is the case with many previous maps. This is due simply to the additional colors that accompany new classes of mortality. Similar patterns of high and low mortality continues to exist, with the added dimension of intensity included in the map. The new algorithm with which these class calculations are produced causes the size of each class to change, as well as the counties that each encompasses. With the same general high and low mortality patterns as most figures presented in this article, different shades of red are dispersed throughout the Southeast, while the Midwest displays the majority of each low mortality intensity level. Average mortality counties are less prevalent throughout the country, due to the redistribution of counties into new classifications. This map has added displays of intensity in both red and blue clusters of counties. White counties are completely absent here, again due to the recalculation of high and low mortality, hence, redistribution of these counties. The major difference between the natural breaks maps (Figures 9, 11, and 12) is the number of counties classified in the high and low groups, where figure 12 distinguishes higher concentrations in both high and low mortality than the previous two figures. The embedded clusters of high mortality in the Southeast and low mortality in the Midwest are still apparent, but new clusters of lower intensity appear for both categories throughout the United States.
As briefly mentioned in the previous paragraph, another cartographic issue that fundamentally relates to the visual appearance of intensity is color scheme. The use of shades across a particular color (red, blue) is effective in showing the hierarchy within one category (high or low mortality), when class size is greater than three. Using a high intensity color is effective in portraying unhealthy areas, whereas a softer color may be used to indicate healthy areas. When constructing a simple map, typically three classifications (Figure 1), the use of red and blue easily distinguishes high and low mortality or healthy and unhealthy clusters, the focus of this map. The designation of white for average mortality counties visually influences the reader away from these counties and allows the high and low counties to stand out more clearly. It is important to remember what medium is being used to display a choropleth map when experimenting with color schemes. Some are more appropriate to use for maps in print form, others may be fitting in web-based maps, or those shown electronically in presentations. These factors contribute heavily to variable colors, as well as background colors of maps.
The basis of clustering in the Local Moran's I is dependent on the type of counties surrounding a target county. For example, a high mortality county surrounded by other high mortality counties is classified as "high-high", indicated by the color red. Blue counties refer to relatively low mortality counties, which are surrounded on each side by other relatively low mortality counties, classified as "low-low". These two classifications, red and blue counties, indicate spatial autocorrelation. Pink counties are high mortality counties adjacent to low mortality counties, and light blue counties are low mortality adjacent to high mortality counties. Counties whose mortality rate is statistically independent of its surrounding counties rates are colored white.
In order to determine significant clusters across space, data methodology must be precise and logical in the preliminary stages. Figure 13 demonstrates that making the appropriate assumptions throughout the research process, as outlined in the previous sections of this article, lead to valid and reliable results. Based on the statistical significance of these clusters, the next step in our research was to test the temporal stability of mortality clustering. Measuring high and low mortality over time, 30-years to be precise, is valuable in finding trends and patterns that occur within the data.
After establishing our method of mortality rate standardization and operationalization, along with the presence of statistically significant healthy and unhealthy county clusters and how to portray them in a choropleth map, another investigation into the data was in order. Temporal stability, i.e. persistence of mortality over time was calculated . For purposes of our research , we used age-adjusted death rates and standard deviation cut-points to measure high and low mortality and divided the thirty-year data into six 5-year time periods: 1968–72, 1973–77, 1978–82, 1983–87, 1988–92, and 1993–97. A significant advantage of using five-year time periods is to provide rate stability for small counties. A five-year average also eliminates potential outliers in deaths per year in a specific county. Any county one standard deviation above the national mean in at least three of the six time periods is classified as persistently high mortality, whereas any county one standard deviation below the national mean in at least three of the six time periods is classified as persistently low mortality. Many counties change classification (high, average, low) over time; very few stay the same throughout. Because there exists a small number of counties designated as either high or low when the criteria is based on consistency throughout all six-time periods (6/6), a minimum of three time periods was assigned as the cutoff to measure persistence (3/6). Essentially, if a county is one standard deviation above or below the mean for at least 15 of the 30 possible years, it is classified as persistent.
The purpose of this map was to identify persistent clusters of high and low mortality over time. The majority of these high and low mortality counties are clustered consistently with those of the Local Moran's I statistically significant clusters, leading to a reasonable conclusion that healthy and unhealthy places are deeply embedded in these particular health outcomes, thereby answering the third research question of our empirical research project. By implementing the mapping procedures discussed throughout this article, we targeted appropriate areas to which social science and demographic research can gain insights into similarities and differences within the social structure of these places, and what characteristics may be harmful or beneficial to the people who reside in these areas. Cluster identification is relevant to this particular article because data construction methodology, standardization, and operationalization have a strong influence over which clusters arise; therefore they must be appropriately defined and targeted. Each methodology outlined in this article has led to the identification of significant and temporally embedded clusters of high and low mortality. Defining these clusters with a high level of confidence, validity, and reliability is a process that takes multiple steps, each of which must be approached with caution.
This article has demonstrated the importance of data transformation and visual display to spatial mortality outcomes through our line of research in healthy and unhealthy places in America. Definition, operationalization, calculation and standardization of the variable being mapped (mortality) are crucial in providing valid and reliable spatial and statistical outcomes in research. Through a descriptive and visual analysis of changes across each of these techniques, the differences that may occur in the spatial distribution of the data become apparent. Importance in standardization and calculation of the dependent variable is outlined, with emphasis on the appropriate methods of detecting trends and changes in mortality rates over time. Using a series of mortality maps demonstrates the stark spatial outcomes that exist between unadjusted and age-adjusted, age-sex adjusted, and age-sex-race adjusted mortality rates. Another necessary point of investigation is cut-points, or the manner in which the variable is operationalized. Quantiles, natural breaks, and standard deviations were summarized, along with the spatial implications provided by each technique and the differences that exist among these classifications. From here, significant mortality clustering was identified using the Local Moran's I, as well as mortality persistence and temporal trends in these clusters over a period of 30 years. Finally, summary maps were provided where necessary to highlight any dramatic changes in the spatial outcomes and patterns across varying methodological techniques.
The maps and graphics used to emphasize the descriptive and theoretical information presented in this article provided fundamental support for the impacts that these techniques have upon data analysis. Without the proper methodology, research results and conclusions may be critically flawed, resulting in an inappropriate investigation into potential policy-making and intervention to at-risk populations, communities, and counties. We hope these illustrations are useful for fellow investigators as they begin to fully employ data visualization and mapping techniques.
This research was made possible by grant number 4 D1A RH 00005-01-01 from the Office of Rural Health Policy of the U.S. Department of Health and Human Services through the Rural Health, Safety and Security Institute, Social Science Research Center, Mississippi State University. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Office of Rural Health Policy. Finally, we wish to thank the anonymous reviewers for their helpful comments.
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.