The results suggest association between non-Hodgkin lymphoma mortality risk and residence within 10 km of a refinery. Estimated RRs showed around a 10% increased risk for the exposed municipalities for both men and women. For the exploratory analysis with the whole country (8098 municipalities) these overall RRs were statistically significant, whereas after controlling the potential confounders by the matching (528 matched municipalities) they were very close to statistical significance. Two regions showed statistically significant increase in risk with both models, Canary Islands and Galicia; these risks were higher than the risk associated with the exposure variable, reaching around 40%. The analysis of the residuals showed no change in the mortality risk distribution with increasing distance from the municipality of residence to the refinery.
The main strength of this analysis was the control of potential confounding by the use of a matched analysis. Comparing the results between the spatial model and the matched analysis we see that RRs and 95%CIs for men are almost identical for both models, while for women the matched analysis provided lower and not statistically significant RRs. Values for the RRs of the regions were also different; however, Canary Islands and Galicia's men still showed very high and statistically significant risks. These results could suggest that the matched analysis has eliminated part of the confounding that was affecting women but not men.
Another important contribution was the joint study of the 10 refineries in the same analysis. The individual study of each refinery would have required individual data on each case, which was not available for our study. In the study of environmental factors that can be associated with health outcomes the availability of a large data set is an advantage in most situations; nevertheless a naive analysis of such data can produce biased results due to confounding. In this study we initially fitted a spatial model using all Spanish municipalities to perform an exploratory analysis. This analysis provided an initial assessment of the presence of association between NHL mortality and residence in the vicinity of refineries; however, we could not be confident that the large heterogeneity among the municipalities regarding population, socio-demographic characteristics and level of industrialization had been fully controlled by the spatial model.
We therefore performed a matched analysis over a sample of the municipalities. If the matching is accurate, accounting for the matching in the analysis will eliminate confounding by the matching variable . Consequently, we matched geographical areas with the aim of eliminating potential confounders such as socio-demographic status and level of industrialization: Some of the refineries were located in middle size cites and others close to big cities but none in small towns (about 50% of the Spanish municipalities have a population below 5000 inhabitants); and some of the refineries were located in highly industrialized regions implying more sources of industrial pollution. The final number of selected areas provided a reasonable sample size, choosing too many matched unexposed areas could introduce residual confounding into the analysis due to less comparability in term of matching factors. We matched areas, not municipalities, to account for the continuous nature of the pollutant emissions that move through the artificial administrative boundaries.
The main objective of this analysis was to study the links between NHL mortality and exposure to refineries; however, we could not ignore the strong regional variation of the NHL mortality shown in previous studies conducted in Spain [16, 25]. The omission of regional data would have generated biased results, and therefore misleading conclusions, given that two of the refineries are located in the regions with higher NHL risk, Canary Islands and Galicia.
We used a mixed effect Poisson model to analyse the matched data because this allows for extra-Poisson variation resulting from unmeasured confounders and misclassification . Previous studies have suggested using conditional Poisson regression models to approach the study of matched data ; however, this does not allow extra-Poisson variation.
To the best of our knowledge, this is one of the first point source modelling studies that has used matched geographical areas. We have already mentioned a study conducted in the UK where a highly industrialized region was matched with a similar region without industry according to its socio-economical characteristics ; however in our study we have matched several exposed areas with multiple non-exposed and used more data sources in the matching strategy. Nevertheless, limitations of our approach mainly came from the nature and definition of the available data including the ecological nature of the socio-demographic data and the lack of information on specific industrial emissions.
The study used mortality data from the official registers. Unfortunately, at present there is no nationwide cancer register in Spain. The non-inclusion of incidence data is an important limitation on the study of potential risk factors. The lack of information about non-lethal cancer cases may bias the analysis; however, according to Gomez-Perez et al. , in Spain relative effects of morbidity associated with tumours that have lower survival rates are well represented by death certificates. Furthermore, we believe there are at most small differences in survival rates or quality of care between regions due to the universal health system established in Spain in 1986. According to the EUROCARE-4 the overall five-year survival rate of NHL for Spain is 51.9% .
Another delicate decision was the use of distance as proxy for exposure, which may introduce bias; for an extended discussion about this topic, see for example [9, 30]. Yet, we would like to point out that the use of isotropic distance instead of a more general metric may introduce bias in the results; however, these problems would tend to affect the analysis by restricting the ability to find positive results, shifting the results towards the null hypothesis, rather than providing spurious associations. Another important decision in the definition of the exposure variable was the maximum distance of 10 km. The previous studies on refineries and petrochemical plants defined shorter distances but due to the aggregated nature of our data, the spatial distribution of the residential areas in Spain (normally around the town centres with large empty areas between towns), and the large areas occupied by the refinery plants, a smaller buffer would have provided very few exposed individuals.
Another limitation is the use of aggregated data for the exposure variable that implied important assumptions. We assumed that registered place of residence determines the estimated exposure; hence no allowance is made either for long-term movements between different addresses or short-term movements between home and work; instead we considered that the whole municipal population to be exposed to the same type and amount of pollutant substances. These assumptions could introduce a misclassification problem to add to the intrinsic ecological bias present on ecological studies (Ecological fallacy); nevertheless the use of small areas (municipalities) as units reduces the risks of ecological bias and misclassification .
The lack of risk gradient with distance is consistent with previous studies in the UK . In this study we consider residence as place of outdoor exposure to refineries emissions. We do not consider occupational exposures, indoor exposures or other outdoor exposures to substances that could be related to a risk increase. All those factors may contribute to non-differential exposure misclassification which can bias the results but also could hinder the detection of a distance effect over the risk of NLH mortality.
The aetiology of NHL is rather poorly understood . The best described risk factor for NHL is immune deficiency. Some theories have associated it with the HIV epidemic , though the inclusion of Highly Active Antiretroviral Treatments (HAARTs) does not appear to have affected the rising trend in NHLs . However, these specific infections account for a very small proportion of total NHL incidence. In addition to immune deficiency and infection, other immune-related conditions like rheumatoid arthritis, systemic lupus erythema, Sjogren's syndrome, psoriasis and coeliac disease, are increasingly being recognised as related to NHL risk . A variety of other exposures are less strongly related to NHL risk.
From the chemical exposure point of view, some studies have linked lymphomas to exposure to substances such as agricultural chemicals , pesticides  and dioxins released by incinerators . Alternatively, a number of occupational exposure studies reported higher NHL incidence and mortality among workers exposed to industrial solvents [38, 39]. Two recent meta-analyses of cohort and case-control studies of NHL, benzene and refinery work provided evidence that benzene is associated with NHL [40, 41]. Benzene is a known human carcinogen and has been shown to have the ability to produce chromosomal and genetic changes important to NHL induction . Benzene was also linked to lymphomas in several animal studies including the 1986 US National Toxicology Program carcinogenicity bioassay of benzene [42–45].
Previous studies of NHL and environmental exposure to industrial pollution offered opposite results; De Roos et al.  did not find association between living near industries and increase in NHL. However, two case-control studies conducted in the US and Canada [8, 12] and a study in Spain  suggested the existence of association between residence in the proximity of industry and increase in NHL. According to the IARC, the substances associated with NHL are tetrachloroethylene, classified as probably carcinogenic to humans (Group 2A); and ethylene oxide, classify as an agent carcinogenic to humans (Group 1) .
Preceding studies that analysed specifically the possible effects of residence near petrochemical plants suggested association, but neither of them was conclusive. De Roos et al.  evaluated the risk of NHL associated with residence within 2 miles of industries. Their study included 94 cases and 76 controls living within 2 miles from refineries. Risk for follicular lymphoma showed an association but not statistical significance (OR = 1.1, CI: 0.7-1.9). In a previous case-control study Linos et al.  found an increase in NHL risk among those living near petrochemical industry; however this was not statistically significant. That study included 14 cases and 18 controls within 3.2 km from the facilities (OR = 1.5, CI: 0.7-3.2). Our results agree with previous studies showing increased risk; but with estimated RRs were generally closer to statistical significance. In our study the number of cases in the exposed areas was 1,134, 589 men and 545 women, while the number of cases in non-exposed areas was 3933, 2038 men and 1895 women. Though case-control studies and ecological studies are not directly comparable, the more statistically conclusive results presented in this study could be due to the larger number of cases included in both exposed and non-exposed categories.
In our results the most important contribution to mortality risk came from its spatial distribution, as expected from previous studies [16, 25]. The excess of risk in the regions of Canary Islands and Galicia and low risk in Madrid were already shown in the atlas of municipal mortality cancer in Spain . This regional variation has also been shown in analyses of cancer incidence. The regional and local cancer registers network (REDECAN) covers the 26.5% of the total population gathering incidence data. A recent study based on these data studied the evolution of the incidence of NHL during the last decades . Four of the 13 registers of the network are located in regions that have refineries within their boundaries. The results of this study showed increased risks for Canary Island, Tarragona (Catalonia) and The Basque Country, while Murcia showed risk below one. Unfortunately, the unknown aetiology of NHL hinders the formulation of theories to explain this regional variation.
The results of our analyses showed similarity in the RRs for the exposure variable for men and women, this fact suggests that environmental risk factors contribute to variation in NHL mortality risk. An examination of the information contained in the E-PRTR for 2007 showed that all the refineries but one reported emissions above the thresholds that determined their inclusion in the registry, for the following heavy metals: chrome, lead, nickel and zinc. Furthermore, 8 facilities reported emissions of arsenic, 7 facilities reported emissions of cadmium, 6 facilities reported emissions of benzene and 4 reported emissions of dioxins and furans (PCDD+PCDF). All the above mentioned compounds but lead, are classified by IARC as agents carcinogenic to humans (Group 1); lead is classified by IARC as probably carcinogenic to humans (Group 2A). In addition two facilities also reported emissions of naphthalene and polychlorinated biphenyls, and a different one reported emissions of vanadium. These three compounds are classified as agents possible carcinogenic to humans (2B) for the IARC . None of the refineries reported emissions of the chlorinated solvent tetrachloroethylene or ethylene oxide. However, refineries can be associated with exposure to many different chemical agents, so this analysis by itself does not provide direct evidence that any single agent is responsible for the observed increase.