The New York State Department of Health (NYSDOH) published the cancer incidence data online as part of their Cancer Surveillance Improvement Initiative, http://www.health.state.ny.us/nysdoh/cancer/csii/nyscsii.htm. These data represent newly diagnosed cancer cases in the period 1993–7 assigned to the patient's residence at diagnosis, and they are calculated as the number of cancers for each 100,000 people in the population. When we began this study (August 2001), the NYSDOH had released data on three cancers: breast (female only), colorectal (female and male), and lung (female and male) cancers.
To protect patient privacy, the NYSDOH data provided case counts referenced to ZIP codes rather than individual residences. While ZIP codes are somewhat arbitrary spatial units of analysis with respect to potential health and environmental factors, they provide a convenient way to group the population and preserve confidentiality. We combined this dataset with ZIP code boundary files, reflecting the geography in November 1999. We purchased the boundary files from Claritas Corporation http://www.claritas.com. While the NYSDOH provides information on the entire state, we focus on the 214 ZIP codes within Nassau, Queens and Suffolk County on Long Island.
People move between ZIP codes and cancer latency (the time between causative exposures and cancer onset) is long, so the ZIP code where the patient was diagnosed may not be the location where the cancer developed nor where causative exposures occurred. We do not include any adjustments for migration or changes in any demographic patterns within the study area.
While the observed cancer diagnosis data did adjust for different populations-at-risk in the different ZIP codes, we also used New York State's adjustment for different age patterns as well. Because cancer incidence is related to age, NYSDOH calculated the expected cancer incidence for each ZIP code using the ZIP code's age structure and the average incidence by age class for New York State. We calculated a standardized morbidity ratio (SMR) by dividing the observed value by the age-adjusted expected incidence. An SMR value of 1.0 indicates that the observed incidence is the same as expected, lower than 1.0 indicates that fewer than expected cases of cancer occurred, and greater than 1.0 indicates that more than expected occurred.
National Air Toxics Assessment
The USEPA National Air Toxics Assessment (NATA, http://www.epa.gov/ttn/atw/nata/) combines information on point and nonpoint emissions of air toxics and weather information into an Assessment System for Population Exposure Nationwide (ASPEN). We obtained ASPEN model 1996 base-year data (Feb 2001 run). The exposure data is approximately concurrent with the cancer study period, thereby precluding any cause-and-effect interpretation, as cancers developed in 1993 could not have been caused by air toxics in 1996. Because of the latency in the development of cancer, it would not even be plausible to say that the 1996 data could explain only 1997 diagnoses. Yet, the 1996 data may be representative of the air toxics prior to 1996, and 1996 is the first year such a comprehensive geographic exposure model was available from the USEPA. As this is an opportunistic analysis, we took the data available. We thereby assume the 1996 data are reasonable representations of air pollution in the preceding decade during which causative exposures might have occurred. This assumption seems reasonable for air pollution sources that have been in operation since the 1980s, and whose dispersal is mediated by transport mechanisms (e.g. prevailing winds) that haven't changed a great deal in the last 10–20 years. The ASPEN model estimates the average annual concentration of a series of known air toxics for all census tracts in the nation. We used concentrations of only those air toxics thought to be potential carcinogens for the three study cancers (Table 7). This list by no means constitutes an exhaustive list of potential carcinogens on Long Island. For the purposes of this study, the compounds in this list were deemed the most plausible carcinogens and exposure to these compounds was combined into a single risk measure, the overall predicted risk (OPR), defined below (Equation 2).
As exposure to each compound has a different risk, we standardized the exposure by multiplying the estimated average annual concentration of each compound by its Unit Risk Estimate (URE) as shown in Equation 1. The URE is the lifetime risk of excess cancer cases predicted to come from continuous exposure to a compound at a concentration of 1 μg/m3 in the air (for more information see definition on the NATA website, http://www.epa.gov/ttn/atw/nata/gloss1.html. UREs may under- or over-estimate the actual risk of exposure to these compounds, as the predictions are extrapolations from tests in animals and/or the effects of low doses. All UREs are from the Draft USEPA NATA report , except that for diesel particulate matter. The USEPA has not yet defined a URE for diesel and so we used the midpoint of the URE range from the California EPA . To calculate exposure for each compound we used the following formula:
Exposure × URE = CancerRisk (Equation 1)
We obtained the annual estimated exposure from the NATA dataset, and used the URE values from Table 7 to obtain estimates of excess cancer cases due to that exposure. As the URE is a risk estimate for all cancer, rather than a cancer-specific figure, the OPR for each cancer is likely an overestimate of risk for an individual cancer.
The use of a national-scale assessment to predict cancer risk based on air toxics is subject to caveats that have been identified by the EPA:
"The UREs used in the national-scale assessment are subject to four major areas of variability and uncertainty. First, many of the pollutants were classified as probable carcinogens because data were not sufficient to prove causality in humans. It is possible that some of these pollutants do not cause cancer at environmentally relevant doses, and that true risk associated with these air toxics is zero. Second, all UREs in this study were based on linear extrapolation from high to low doses. It is possible that the true dose response relationships for some pollutants may be less than linear, resulting in an overestimate of risk. Third, most UREs in this study were developed from animal data using conservative methods to extrapolate between species. Human responses may differ from the predicted ones. The first three elements are comprised entirely of uncertainty. Fourth, most UREs in this study were based on statistical upper confidence limits, though some were based on statistical best fits. (While this does not affect overall uncertainty, UREs based on best fits should be unbiased, while those based on upper confidence limits should be biased high.) This fourth element represents a combination of variability (i.e., based on variation responses of different people or animals) and uncertainty (i.e., potential errors in the measurement of exposure and response). Because of the aggregate treatment all four sources of variability and uncertainty described above, EPA considers all its UREs to be upper-bound estimates."
Regarding the use of UREs, one should note that the methods employed are sensitive to the relative, rather than absolute value, of the risk estimates. By focusing on boundaries, we are able to identify spatial structure and geographic associations so long as the relative values of the OPRs are correct. Hence the methods employed will yield the same results for a biased risk estimator – provided the bias on average is the same for all observed values. We also note the UREs in Table 7 are for all cancers combined, and not for site-specific cancers. Our analysis identified compounds thought to be carcinogens for each of the 5 site-specific cancers we considered, and then calculated a site-specific URE based on the values in Table 7. Ideally, one should use site-specific UREs, but these are not yet available from EPA.
We calculated an overall predicted risk from air toxics (OPR) for each cancer by summing up the excess cancer cases for each of the relevant compounds as shown in Equation 2.
Σ CancerRisk = OPR (Equation 2)
Summing up the excess cancer cases for all of the relevant compounds assumes an additive relationship – that particular compounds do not interact in a synergistic or threshold-related manner to influence dose-response relationships. The EPA is currently using an additive model for assessing dose and response to multiple compounds, but further research needs to be done to confirm the additive model or else replace it with a more appropriate model. Again, pattern recognition approaches are useful under this kind of uncertainty since they are relatively robust provided the rank order of the estimates is "about right."