Modelling the variation of land surface temperature as determinant of risk of heat-related health events

Background The evaluation of exposure to ambient temperatures in epidemiological studies has generally been based on records from meteorological stations which may not adequately represent local temperature variability. Here we propose a spatially explicit model to estimate local exposure to temperatures of large populations under various meteorological conditions based on satellite and meteorological data. Methods A general linear model was used to estimate surface temperatures using 15 LANDSAT 5 and LANDSAT 7 images for Quebec Province, Canada between 1987 and 2002 and spanning the months of June to August. The images encompassed both rural and urban landscapes and predictors included: meteorological records of temperature and wind speed, distance to major water bodies, Normalized Differential Vegetation Index (NDVI), land cover (built and bare land, water, or vegetation), latitude, longitude, and week of the year. Results The model explained 77% of the variance in surface temperature, accounting for both temporal and spatial variations. The standard error of estimates was 1.42°C. Land cover and NDVI were strong predictors of surface temperature. Conclusions This study suggests that a statistical approach to estimating surface temperature incorporating both spatially explicit satellite data and time-varying meteorological data may be relevant to assessing exposure to heat during the warm season in the Quebec. By allowing the estimation of space- and time-specific surface temperatures, this model may also be used to assess the possible impacts of land use changes under various meteorological conditions. It can be applied to assess heat exposure within a large population and at relatively fine-grained scale. It may be used to evaluate the acute health effect of heat exposure over long time frames. The method proposed here could be replicated in other areas around the globe for which satellite data and meteorological data is available.


Background
Heat waves and urban heat islands have been associated with increased mortality, particularly among persons with social or physical vulnerability [1][2][3]. With an increasing proportion of the world population living in urbanized regions, aging vulnerable populations, and climate change signalling increased frequencies of extreme climatic events and heat waves [4,5], the need to better understand ambient temperature determinants and associated health risks is important. Whereas the effect of elevated temperatures on human health has been widely studied over recent decades [6], epidemiological studies analysing acute health risks associated with exposure to elevated ambient temperature have mostly relied on meteorological stations to estimate temperature exposure. Yet, because meteorological stations are often located in sparsely inhabited areas -such as airports and parks -they provide only a partial representation of ambient temperatures in heterogeneous urban, suburban and rural landscapes where populations live [7]. Yet, variations in vegetation densities, open spaces, concrete surfaces, building disposition and height condition lead to micro-climatic variations in diurnal and nocturnal temperatures [7][8][9] within a same city [10,11]. As such, ambient air temperature measured at meteorological stations or simple interpolations between temperatures measured at stations may offer misleading measures of true exposure to local heat experienced in residential settings and so may bias the estimation of associated health risks. In support of this, we have shown that the risk of death associated with high daily temperatures at meteorological stations, was greater in areas with higher surface temperatures than in areas with lower surface temperatures [3].
Because the existing network of ambient temperature sensors only provides information at a few selected locations, remotely sensed thermal infrared (TIR) data has been used to estimate intra-urban variations in ambient temperatures, and to identify "hot spots" referred to here as "micro-urban heat islands". Various satellite or airborne sensors including NOAA-AVHRR [12], MODIS [13] and LANDSAT [14], have been used to that purpose. Among these, the 60 m spatial resolution of LANDSAT 7ETM+ in the thermal IR band provides a useful resolution for analysing spatial variations in surface temperatures within cities, as well as along the urban-rural continuum.
Concordance between satellite-derived surface temperature estimates and ambient temperatures measured at meteorological stations can be strong in certain atmospheric conditions [15,16] (e.g. clear skies and mixing of any surface-based inversion layer [17]). This holds for various land cover configurations (accuracy to within ± 3-5% on average) [14,18,19]. Hence, surface temperatures derived from satellite images offer an interesting basis for establishing local measures of exposure to heat, and may be used in epidemiological studies of the association between exposure to temperatures and acute mortality or morbidity. However, satellite images are usually available for a limited set of dates, which prevent direct connection between daily vital statistics and satellite-derived land surface temperatures (LSTs). Furthermore, relations between meteorological measures and satellite-derived LSTs may differ depending on areas. Land-use regression techniques allow to model values of a spatially continuous phenomenon like air quality or surface temperature using local characteristics of the built environment. Early applications have modeled road salt contamination [20] or soil depths [21], but most applications concern the modelling of intra-urban variations in air pollution as a mean to improve precision in exposure for epidemiological models [22][23][24]. The aim of this paper is to develop a land-use regression of local surface temperatures using land cover, meteorological, and locational and temporal predictors, which can be used for both surveillance and for epidemiological analyses of acute heat-related population health outcomes over large territories.

Satellite Images
We used a total of 15 LANDSAT multispectral images. Eight LANDSAT 7ETM+ images, taken between 1999 and 2002, covered different but partially overlapping extents in the southern portion of Quebec Province, Canada (See Figure 1: extents correspond to areas of roughly 200 km × 200 km, for which images provide 30 m cell-sizes for non-IR channels, and 60 m for IR channels). The regions covered, as listed below, included the main inhabited and urban areas of the province of Quebec: Montreal, Quebec City, Rimouski, Sept-Îles, Sherbrooke, Saguenay, Gatineau, and Rouyn-Noranda, representing some 90.3% of the total province population according to the 2006 Census (See Table 1 and Figure 1). In addition to these images, seven LANDSAT 5TM images, taken between 1987 and 1999, and covering part of the extent 014-028 over the greater Montreal region, were also used (cell size of 60 m for non-IR channels, and 120 m for IR channels); this extent selection covered urban, rural and suburban areas. Given the number of images used for part of this extent, our model best predicts surface temperatures for these regions, even if it extends to other inhabited regions of Quebec. All images where taken during June, July and August, under clear sky conditions, and contained less than 10% cloud coverage. Further hand removal of cloud pixels was done to retain only pixels representing surface temperature. The LANDSAT 7ETM+ image covering the Montreal region overlapped partly with the LANDSAT 5TM images of Montreal, but each image was treated separately. Image locations, dates, and times are given in Table 1. Given the varying spatial resolution of the different maps, pixels for all channels were re-sampled to 30 m, so as to keep the highest resolution when available.

Surface temperature calculation
LANDSAT images were processed using the automated cloud cover assessment (ACCA) algorithm [25]. We retained images with less than 10% cloud coverage. Remaining cloud-contaminated pixels were further removed by a research assistant using a GIS before the analysis.
Spectral radiance (R) was calculated from the LAND-SAT thermal channel digital numbers (DN), using gain and bias values included in the LANDSAT image product according to: where G (gain) and B (bias, or offset) are calibration values included in the LANDSAT image product. Radiance values of all pixels were then converted to surface temperature (T) values according to: where K 1 is 607.76 for TM and 666.09 for ETM+ and K 2 is 1260.56 for TM and 1282.71 for ETM+ [14].

Predictors of calculated surface temperatures Land cover
Land cover types were determined using the maximum likelihood algorithm provided by PCI Geomatics, with a supervised classification scheme using all channels except the thermal IR band (band 6). Three land cover categories were defined: built and bare land, vegetation, and water. Stable sites, such as the Olympic stadium and Botanical Gardens in Montreal, were used as reference points for land cover specification while controlling for inter-annual variability in images.

NDVI
The Normalized Differential Vegetation Index was calculated based on LANDSAT bands 3 (Red) and 4 (Near-Infrared) according to standard practice [26]. Using images taken between the first week of June and the last week of August allowed us to limit interseasonal variation in vegetation density and chlorophyll activity throughout the summer season.

Meteorological data
Meteorological data from all available Environment Canada stations [27] providing hourly data within each extent was simply averaged, to obtain image-specific measures of ambient temperature and wind speed at the time the images were taken as well as for the three-day average, that is, for the day the image was taken and the two previous days. This simple approach was used with the view of easily reproducing the method in future model use. Between two and fifteen meteorological stations with complete hourly data were available per image (see Table 1).

Complementary spatial and temporal measures
Because proximity to large water bodies is known to influence LST [28], the Euclidian distance to the nearest of three main water bodies in Quebec (St. Lawrence River, Saint-Jean Lake, and Abitibi Lake) was computed for each pixel. Latitude and longitude was derived within a projected Universal Transverse Mercator coordinate system (UTM, NAD83 zone 19N). A dummy variable relating to each image was further used to capture any possible inter-image variations not accounted for by the other predictors. Week number was used to account for temporal variation.

Statistical analyses and validation
We used a random subsample of 2% of the total number of pixels (n = 480,000 points) done to reduce the large number of total pixels of the 15 images in our analysis (n = 246,590,536). Within each map, the number of points sampled was proportional to the area. Using GIS, we computed LST and predictor data for each of the 480,000 sampled 30 m resolution pixels. A general linear model (GLM) was developed to estimate land surface temperatures of the sampled pixels using land cover, NDVI, meteorological data and complementary spatial and temporal information described above (see Formula 1).
With M being a matrix of meteorological indicators (air temperature, wind speed), LC a matrix of land cover indicators (land cover categories and NDVI), XY a matrix of latitude longitude indicators, and T a matrix of temporal indicators (Week number in our case), E a matrix of extent indicators, all Betas being vectors of corresponding regression coefficients, and epsilon the error term.
Inter-variable correlation was tested using Pearson's correlation coefficient, whereas spatial autocorrelation Inter-variable correlation tended to be relatively weak (r = 0.27, SD = 0.07). The highest correlation was found between week number and wind speed (r = 0.54). LST correlations between overlapping images ranged from 0.212 to 0.685.
The GLM explained a relative high proportion of the variation in surface temperature (R² = 0.77), and resulted in a relatively low residual standard error (1.42°C; Table 3). Predicted temperatures ranged between 7.9 and 29.0°C.
Coefficients for model parameters are given in Table 3. All predictors were significant at the 0.0001 level. Overall, when comparing the entire data ranges of the predictors, it was found that NDVI had the strongest effects, with a 95% effect range of 9°C -or, in other words, 95% of the variation in NDVI explained up to 9°C variation in surface temperature. Land cover, week number, three-day average temperature, and our dummy map predictor also showed large effect ranges, with 95% effect ranges of 5°C. Other parameters tended to have smaller effects.

Model Analysis and Validation
Spatial autocorrelation among surface temperatures was high, and significant up to a 2 km range, beyond which SA decreased significantly. However, the Moran's I values for the model's residuals were significantly reduced, but remained significant in the 0-2 km range, suggesting additional environmental factors would need to be accounted for to fully explain variations in LST. As mentioned earlier, spatial autocorrelation does not influence our prediction of surface temperatures. A closer look at local spatial autocorrelation may be useful to raise new hypotheses identifying additional relevant environmental determinants.
Estimates of surface temperatures were comparable to observed LANDSAT-derived point data. Figure 2 shows the distribution of residuals of the point data. The predictive capabilities of the model was evaluated by applying the model to the whole dataset (n = 246,590,536) and comparing predicted to observed values. The mean difference between observed and predicted temperatures was 1.42°C (Standard deviation = 2.67). In the context of land use and climate change, such a model provides local estimates of surface temperature that can be used in epidemiological studies on acute heat-related health outcomes, and as a planning and prevention tool. As expected, meteorological indicators of air temperature, especially the three day average, were strong predictors of LST. Also consistent with the literature, land cover predictors (i.e. NDVI and land cover categories) were significantly and strongly associated with surface temperatures [29][30][31]. The cooling impact of NDVI has been well documented [29,30], and variations in LSTs within urban areas have been linked to the spatial configuration of land use and land cover [30,31].
Application of surface temperature estimates to epidemiological models of heat-related health outcomes has often been limited by the availability and the relatively low resolution of surface temperature maps that cover large areas. The use of a spatio-temporal model is therefore advantageous as it allows matching between high spatial and temporal resolutions of temperature estimates and at-risk populations or health events. By providing surface temperature estimates for any location and time, it overcomes the potential temporal and spatial discrepancy between either available images or  available meteorological data and health outcome events. Meteorological data has usually been used as a proxy of heat conditions across a large region, with temperature from such stations either considered uniform or simply interpolated across an extent [7]. Because our model explicitly integrates spatial terms, like land-cover and greenness, micro-variations in temperatures -that is, differences at the urban lot size scale -can be explored. The use of a relatively large set of images -covering most of the inhabited areas of the Quebec province, including summer's intra-seasonal variations -allows for the creation of calibrated models across the province. These models are representative of June to August summer days, but were not specifically calibrated for extreme heat event conditions. Further work is needed to assess the robustness of this model for prediction of surface temperatures under extreme heat events.
There are other limitations inherent to both the model and to the use of satellite imagery for our capacity to predict LSTs.First, our outcome measure, LST, can be calculated in different ways, and may for example account for emissivity or other parameters. Here we used the simplest but classic and validated formulae for LST determination [14]. The effect of downscaling to a resolution of 30 m on the association between LST estimates and health outcomes deserves further attention. The model itself can be improved by refining or better describing physiographic features of the landscape. For instance, elevation or orientation was not considered in our model, although the use of such measures for surface temperature modelling is relatively common [32]. However, studies using satellite imagery to examine surface temperatures have not explicitly looked at the relation with elevation or orientation [29,33]. One of the limitations to such data integration relates to the resolution of available digital elevation models (DEM), especially when covering such large areas. We only had access to a coarse Canadian-wide DEM providing a 100point per latitude/longitude elevation matrix (10 points of latitude * 10 points in longitude). Use of more finegrained DEM may improve the performance of our model.
The land cover classes used were also relatively general. Although coarse classification does limit our capacity to assess micro-level variations in LST, because local structural differences within land cover classes are not accounted for, the use of broad categories does reduce the risk of misclassification. It has been documented that areas of urban-rural transition are especially prone to such classification errors [34], and the potential benefit of increasing classification precision, with an associated risk of misclassification, is still unclear. Yet, interesting novel methods suggest possible increased precision in land cover classification based on TM images, with possibility to discriminate residential from commercial and industrial land, account for density, and specifically identify urban green space [35].
The fact that significant residual spatial autocorrelation remained in our models, suggests that further addition of spatial predictors would be required to improve the predictive capacity of the model and limit coefficient bias. As an example, integrating wind directions may be useful to account for the cooling effect of water bodies in downwind areas [28]. Alternatively, explicit spatial modelling strategies, like inclusion of autoregressive terms or use of Geographically Weighted Regression models for example, could be used. "When obtaining point predictions, it matters little if we model the spatial variation entirely through the covariates, entirely as small-scale variations characterized by the semivariogram or sigma, or through some combination of covariates and residual autocorrelation. Our choice of covariates mainly affects the interpretation of our model and the magnitude of the prediction standard errors" [36].
While our model does predict outdoor surface temperatures, heat exposure that may be relevant for epidemiological models would need to account for both outdoor and indoor temperatures. As such, remotely sensed surface temperatures provide only a fraction of true exposure, and complementary analysis using building characteristics may be used to further provide estimates of indoor exposure to heat. Previous work on the subject revealed a relatively linear relation between indoor and outdoor surface temperatures, with a positive influence of building height [37], for dwellings that were not equipped with air conditioning.
Limitations in the use of satellite imagery for establishing models of land surface temperatures are also inherent to the absence of precise surface geometry information. Surface geometry of buildings can induce strong micro-urban temperature patterns due to differential solar heating [38][39][40][41]. An associated directional bias is discussed by Nichol [39], where near-nadir viewangle of satellite sensors may yield temperatures that are warmer or cooler than off-nadir views, depending on the view direction relative to solar position, time of day, and surface characteristics of the environment such as building disposition and height. While the effect of surface geometry on temperature was not considered in our study, its effects can be pronounced and could be considered in future developments. Lagouarde et al. [40] have reported temperature differences upwards of 12°C between nadir and off-nadir temperatures in Marseille, France, while Voogt and Oke [41] found a maximum difference of 10°C in Vancouver, Canada. The midmorning LANDSAT images used in our study may be prone to this bias as rooftops heat up much faster than shaded walls do, because the low angle of morning sunlight offers potential for sizeable variation between nadir and off-nadir temperatures [39]. Additionally, our model best predicts surface temperatures for the years for which the images used. As with other satellite based studies, the availability of adequate images constrains the utility of our model. Although sensors that produce NDVI and land cover (such as IKONOS, QuickBird, SPOT, or MODIS) are still widely available, there are currently few sensors with sufficient resolution Thermal IR capabilities for detailed micro-urban temperature mapping. Both LANDSAT 5 and LANDSAT 7 satellites are well past their mission's expectancy and the next satellite in the LANDSAT series, the LANDSAT Data Continuity Mission (LDCM), scheduled to launch in December 2012, is not yet expected to include a thermal IR imager [42].
Finally, precise air temperature estimates are only useful if corresponding health events can be mapped accordingly. Precise information on people's exposure locations are required to conduct such analyses. Information on peoples' main locations may further improve our capacity to unravel the effect of environmental dimensions on health [43].

Conclusions
This type of land-use regression model optimises the spatial and temporal estimation of LSTs. Results from Smargiassi et al. [3] suggested that the risks of mortality associated to high temperatures is higher within microurban heat islands, over Montreal city. Another study held in Philadelphia looking at a heat event in July 1993 demonstrated the significance of remote-sensing land surface temperatures readings in predicting mortality, beyond socio-demographic information [44]. This model will provide estimates of exposure to heat islands within the inhabited regions of the province of Quebec, and improve our ability to explore environmental determinants of heat-related health events. Use of such models allows identifying features associated with heat that are amenable to change, like NDVI or land use. Using epidemiological models, cities may identify high-risk areas with low vegetation indexes, and implement policies to change the built environment. Furthermore, because it is possible to cover large and diversified territories, such an approach allows studying health risks in a variety of landscapes and urban configurations. With satellite, land use and meteorological data available in most populated settings, such models could be calibrated elsewhere to improve our understanding of the relation between heat events, land use, and acute heat-related health outcomes.