The buffer and Cutter functions outperform the containment, power and exponential functions (Table 3). This improvement is notable both for the R-squared values of the OLS regressions and the Akaike Information Criterion (AIC) for all regressions. Because the AIC penalizes regression models with more parameters, lower values are preferred. These functions also outperformed the RSEI risk-related results values, which was not the expected outcome. Other RSEI products, the hazard, modeled hazard, and modeled hazard*population product were also tested, but all performed worse than the buffer and Cutter functions. However, the power and exponential functions from the spatial interaction literature did not perform much better than leaving out the toxicity term, and occasionally even increased the AIC value, which may result from the coarse resolution of the county-level dataset. These two functions may yet be useful at a finer scale.

The improvement of the buffer and Cutter functions over the RSEI data demonstrates that despite the difficulties posed by the Modifiable Areal Unit Problem and the size of large western counties obscuring variation of risk within the county, these spatial interaction approaches may still be an accurate reflection of the risks posed by TRI facilities. It should be cautioned that while this work demonstrates a relationship between TRI facilities and lung cancer, it does not yet indicate a causal link, nor does it indicate that the best-fitting risk estimation method, a large buffer around the TRI site, has the strongest causal relationship with lung cancer mortality.

Additionally, AIC values are better for the spatial regression techniques compared to the OLS regression values. However, including the TRI term in the spatial regression techniques does not lead to as much improvement over the base case of no interaction term. Even so, the improvement in the buffer model gives the spatial error regression of the buffer model the lowest AIC value. Both the Moran's I spatial autocorrelation statistic given above and the lower AIC values for the spatial regression techniques indicate that there is spatial dependence in lung cancer mortality. Moreover, this spatial dependence is not accounted for by the independent variables. This dependence is most likely the result of one or more additional spatial processes affecting lung cancer that are not accounted for in these data, rather than a simple diffusion or contagion process of lung cancer itself. The limited improvement from adding the TRI impacts strengthens the suggestion that there remain geographic processes affecting lung cancer that are not accounted for in these datasets. While it is not executed in this study, GWR may also reveal further evidence of confounding processes by revealing interactions with modeled covariates via non-stationary regression coefficients.

As with the different regression methods, the buffer and Cutter functions have the best R-squared values across the entire range of rural-urban continuum codes (Table 5). Also, category 5, defined as counties containing a larger town (more than 20,000 residents) but which are not adjacent to a metropolitan area, has much higher R-squared values than the other rural-urban codes across all models. It is not yet clear why this would be the case.

These results suggest that changing the method used to estimate risk will change the representation of the spatial impacts of the TRI sites on public health. As others have noted, the scope and scale of analysis can substantially impact the results [48], so researchers should be cautious when generalizing these findings at a county scale and national scope to more local scales and scopes. Nonetheless, researchers using the TRI dataset to estimate the health risks from pollution should carefully consider the method used to estimate the risk, as the most sophisticated model used here, the RSEI data, did not provide the lowest AIC values.

The maps in figure 3 display the percent of the TRI impact on each county from sources in urban areas calculated using the functions that performed best in the earlier results. As estimated by these models, the potential effects of pollution from urban TRI releases extend far beyond the limits of the urban areas. However, the extent varies depending on which function is used and how it is parameterized, highlighting the importance of using an appropriate function. In the power and exponential maps (figure 3a and 3b), the impacts from urban release sites are more limited to urban areas and the nearby rural communities. In both the buffer and Cutter maps (figure 3c and 3d), rural areas in the northeastern and southwestern United States, have between 75 and 100% of their estimated TRI impact from release sites in urban areas. These extended effects of urban areas are related to the large radii used in the distance decay functions. Additional work is needed to examine the environmental toxicology to determine whether the chemicals being released could travel such large distances or whether these models are simply capturing spatial dependence of the outcome that is induced by a confounding spatial process.

Future work will investigate the parameterization choices of the functions. This ad hoc approach to parameterization-examining different possibilities of the α, θ and threshold parameters-is not the ideal approach. Statistical approaches to finding the optimal α and θ parameters can be incorporated to improve the spatial interaction models that are generated [29, 30, 32]. A geostatistical approach can be applied to determine the decay function form and parameters. A correlogram plot comparing the distance between two counties and the difference between their mortality rates or their residuals from a regression function could be used to parameterize the function. Additionally, subsets of the correlogram could be examined separately to investigate anisotropy and non-stationarity. However, with both the ad hoc approach in this paper and a statistical model-fitting approach, using the data to optimize the parameters and then using those parameters to analyze the same data introduces circularity into the model-fitting process that would best be avoided.

A more theoretically sound approach would be to vary the α and θ parameters based on the properties of the toxic chemicals that are released. Varying α is similar to methods used somewhat frequently to account for the different toxicity of the chemicals released [2, 8, 11, 20–22], although the studies cited here use multiplicative rather than exponential modifiers (*α t*_{
i
}instead of *t*_{
i
}^{
α
}). In each case, higher values of *α* correspond to more toxic chemicals. Different studies have made this adjustment using different references, including American Conference of Governmental Industrial Hygienists Threshold Limit Values [3, 22], a chronic toxicity index [12], an inhalation unit risk [23], a lifetime cancer risk [24], and the RSEI model [9, 27]. Similarly, θ and *T* can be varied to reflect differences in airborne transport of the chemicals. If a chemical travels more easily and farther, lower values of θ and higher values of *T* can be used. These parameters can also be varied based on the direction from the release site to the affected community, thus incorporating anisotropy.

Ongoing work includes the refinement of at-risk population estimates using the LandScan USA population dataset [49] which can explore variation missed by county-level populations unable to capture fine-scale risks. For example, if a chemical is only present in the atmosphere within a mile of the release site, any county-by-county analysis will be problematic because the spatial resolution of county-level data are coarser than a square mile. The LandScan dataset provides population estimates at a 3 arc-second resolution (roughly 90 meters). This can then provide improved estimates of the number of people within one mile of the release site instead of assigning the impact of a release site on the county as if everybody lived at the centroid of the county. This approach will have stronger effects on the power and exponential models because they have more rapid decreases in the impact as one travels farther from the release site (figure 2). This ongoing work also incorporates the adjustments given above varying the parameters to account for properties of the chemicals released and local climatic conditions to account for prevailing wind directions.