Methodological approaches to the study of cancer risk in the vicinity of pollution sources: the experience of a population-based case–control study of childhood cancer

Background Environmental exposures are related to the risk of some types of cancer, and children are the most vulnerable group of people. This study seeks to present the methodological approaches used in the papers of our group about risk of childhood cancers in the vicinity of pollution sources (industrial and urban sites). A population-based case–control study of incident childhood cancers in Spain and their relationship with residential proximity to industrial and urban areas was designed. Two methodological approaches using mixed multiple unconditional logistic regression models to estimate odds ratios (ORs) and 95% confidence intervals (95% CIs) were developed: (a) “near vs. far” analysis, where possible excess risks of cancers in children living near (“near”) versus those living far (“far”) from industrial and urban areas were assessed; and (b) “risk gradient” analysis, where the risk gradient in the vicinity of industries was assessed. For each one of the two approaches, three strategies of analysis were implemented: “joint”, “stratified”, and “individualized” analysis. Incident cases were obtained from the Spanish Registry of Childhood Cancer (between 1996 and 2011). Results Applying this methodology, associations between proximity (≤ 2 km) to specific industrial and urban zones and risk (OR; 95% CI) of leukemias (1.31; 1.04–1.65 for industrial areas, and 1.28; 1.00–1.53 for urban areas), neuroblastoma (2.12; 1.18–3.83 for both industrial and urban areas), and renal (2.02; 1.16–3.52 for industrial areas) and bone (4.02; 1.73–9.34 for urban areas) tumors have been suggested. Conclusions The two methodological approaches were used as a very useful and flexible tool to analyze the excess risk of childhood cancers in the vicinity of industrial and urban areas, which can be extrapolated and generalized to other cancers and chronic diseases, and adapted to other types of pollution sources.


Background
Environmental exposures are related to the risk of some types of cancer [1], and children are the most vulnerable group of people because they are far more sensitive than adults to toxic chemicals in the environment [2,3]. Moreover, the causes of many childhood cancers are largely unknown, so it is necessary epidemiologic research as a tool for identifying associations between proximity to environmental exposures and the frequency of these cancers. In this sense, the biggest populationbased case-control study of incident childhood cancer in Spain has been carried out by our group with the purpose of analyzing the risk of various types of cancer in the proximity of environmental exposures (industrial installations, urban areas, road traffic, and agricultural crops) [4][5][6][7][8][9][10][11][12].

Results
Spanish industrial installations included in the European PRTR (E-PRTR) were taken into account in the paper. A list of industrial groups, together with their E-PRTR categories, and number of industrial installations and amounts (in kg) released by these industrial plants in 2009, by groups of carcinogens [according to the International Agency for Research on Cancer (IARC)] and groups of toxic substances, are shown in Table 1. A list including the specific pollutants released to both air and water, by category of industrial groups, are described in detail in Table 2.

First methodological approach: "Near vs. far" analyses
As a first example of this methodology, the odds ratios (ORs) and their 95% confidence intervals (95% CIs) of the several childhood cancers studied in our papers in relation to the analysis of industrial and urban areas as a whole (analysis 1.a), for industrial distances between 2 and 5 km, are shown in Table 3. Statistically significant excess risks were found in children close to: (a) industrial facilities for leukemias ( The ORs of those childhood cancers with statistically significant results and a number of controls and cases ≥ 5, for the "near vs. far" analysis by category of industrial group (analysis 1.b) and an industrial distance of ≤ 2.5 km, are shown in Table 4. The following positive associations between certain cancers and residential proximity to specific industrial groups were found: (a) 'Production and processing of metals, 'Galvanization' , 'Surface treatment of metals and plastic' , 'Glass and mineral fibers' , and 'Hazardous waste' ⇔ leukemias and renal tumors; (b) 'Organic chemical industry' and 'Urban wastewater treatment plants' ⇔ renal and bone tumors; (c) 'Pharmaceutical products' ⇔ leukemias and bone tumors; (d) 'Surface treatment using organic solvents' ⇔ leukemias; (e) 'Ceramic' and 'food and beverage sector' ⇔ renal tumors; (f ) 'Mining' ⇔ neuroblastoma; and, (g) 'Cement and lime' ⇔ bone tumors.
As an example of the "near vs. far" analysis by category of pollutants (carcinogens and toxic substances) (analysis 1.c) for an industrial distance of ≤ 2.5 km, the ORs of leukemias, and renal and bone tumors are shown in Table 5. Statistically significant excess risks of leukemias and bone tumors were found in the environs of facilities releasing substances included in all IARC groups. In the case of bone tumors, the excess risk was only observed near industries releasing Group 1-carcinogens. According to the categorization of 'Groups of toxic substances' , statistically significant ORs of leukemias, and renal and bone tumors were found in all groups of toxic substances (with the exception of plasticizers for renal tumors, and volatile organic compounds for bone tumors).
Finally, the ORs of those childhood cancers with significant results and a number of controls and cases ≥ 5, for the "near vs. far" analysis by specific pollutant (analysis 1.d) and an industrial distance of ≤ 2.5 km, are shown in Table 6. The highest ORs were found in the environs of industries releasing:    Second methodological approach: "Risk gradient" analyses As an example of this methodology applied to renal tumors, statistically significant radial effects (rise in OR with increasing proximity to industries, according to concentric rings) in the vicinity of industrial installations, both overall (analysis 2.a) and by industrial group (analysis 2.b), were detected (see Table 7) in all industries as a whole (p-trend = 0.007), and in the following industrial groups: 'Surface treatment of metals and plastic' (p-trend = 0.012), 'Urban and waste-water treatment plants' (p-trend = 0.034), 'Food and beverage  sector' (p-trend = 0.040), and 'Glass and mineral fibers' (p-trend = 0.046).

Discussion
In the present paper, two different methodological approaches to perform the statistical analyses in the study of risk of childhood cancer in the vicinity of industrial and urban sites have been used by our group. These two approaches are complementary, none is preferable to the other: the "near vs. far" approach is often used as a first step in the study of cancer risk in the environs of pollution sources, whereas the second approach ("risk gradient" analysis) is often used to complement the results obtained in the first approach, giving a more detailed information about the behavior of the risk in different partitions of the "near" zone. Positive results or positive associations found in both approaches support and reinforce the hypothesis of a "real" excess risk in the vicinity of the pollution sources analyzed in the study. However, the main limitation of these methodological approaches is the choice of the radius in the "near vs. far" analysis and the critical categorization in concentric rings in the "risk gradient" analysis, although our industrial distances are in line with the distances used by other authors [13][14][15]. Another limitation is the assumption of the linear trend in the risk in the "risk gradient" analysis, something that might not be true.  In relation to alternative approaches published by other authors, Barbone et al. [16] used an alternative strategy in the definition of "exposure" variable for the "near vs. far" analyses, based on deciles of the distribution of the industrial and urban distances, in a case-control study of air pollution and lung cancer in Trieste (Italy). In that study, there were one urban nucleus and three industrial pollution sources: a shipyard, an iron foundry, and an incinerator. Our group adapted their strategy in a similar case-control study of lung cancer risk and pollution in Asturias (Spain) [17,18], with 48 industrial facilities, and 4 urban nuclei with numbers of inhabitants ranged between 24,735 and 263,547 inhabitants. However, when the sizes of the towns differ considerably among them, that methodology causes an irregular distribution of cases and controls between the zones around the towns, since all towns have the same radius for the "urban area" and only a few big cities include the majority of cases and controls. Because of this, we consider that our methodology is more appropriate for analyses with many towns and very different size of the towns (see Fig. 2).
The methodology used in the present paper can be extrapolated to other tumors (even in the general population) and/or other countries with a National Registry of Cancer. In fact, the methodology has already been implemented in the 'MCC-Pollution' study (included in the 'MCC-Spain' project [19]), a population-based multicase-control study that analyzes the risk cancer in tumors of high incidence in the Spanish general population associated with residential proximity to industrial facilities [20]. The diagram of Fig. 1 can also be generalized to other chronic diseases which could be related to environmental risk factors. In general, our results suggest possible associations between residential proximity to specific industrial and urban zones and risk of some childhood cancers, especially leukemias, neuroblastoma, and renal and bone tumors. In relation to industrial sites, this risk was found in children living in the environs of several industrial types and industries releasing specific carcinogens and toxic substances.
This methodology can be applied directly to other hazardous point sources and toxic hotspots, such as e-waste recycling sites and illegal hazardous dumps [21], and it can also be easily adapted when the pollution focus is not a single point (e.g.: industry, urban nucleus) but a line (e.g.: road traffic, motorway, polluted river) [12] or a polygon (e.g.: crops treated with pesticides) [9]. Taking the dispersion of air pollutants into account, the methodology allows the possibility of using information about wind roses (which include the direction and speed

Leukemias [8] Renal [5]
Bone [4] OR of prevailing winds around specific monitoring points) together with the distance to refine the definition of industrial proximity to pollution sources [17].
To replicate this methodology in other countries, in relation to the location of subjects (cases and controls) and pollution sources (industries and towns), the children's domiciles (and geographic coordinates) for cases and controls should be provided by the respective National Registry of Childhood Tumors and National Statistics Institute (see Fig. 1), under collaboration agreements, because they are usually very sensitive data (see Availability of data and material section). In the case of  the industries, all information about industrial plants, including geographic coordinates is publicly available. In the case of the towns, the geographic coordinates of towns' centroids are publicly available in the Spanish Census. On the other hand, the tools used in the geocoding strategies for all these elements (cases, controls, industries, and towns) are open access (see Methods section). The methodology used in the paper requires the compulsory use of geographic coordinates to be applied correctly in the different analyses.
Epidemiological studies of childhood cancer in relation to proximity to pollution foci have reached great importance recently [22][23][24][25][26][27], and industrial registers of toxic substances as the E-PRTR provide a tool for the monitoring and surveillance of harmful effects of these industrial pollutants, some of them carcinogenic, on the human health. In this sense, our experience is being positive because our study is providing some epidemiological clues that residing in the vicinity of certain industrial and urban areas may be a risk factor for some types of childhood cancers.
With regard to childhood leukemias and the pollution sources analyzed in our previous papers, our findings about proximity to industrial groups (see Table 4) are consistent with other studies in relation to the excess risk found in the environs of the metal industry (which includes 'Production and processing of metals' , 'Galvanization' , and 'Surface treatment of metals and plastic') [28,29] and installations for the manufacture of 'Glass and mineral fibers' [28], although other authors did not find associations with proximity to incinerators ('Hazardous waste') [15]. In relation to specific carcinogens and groups of pollutants, some authors found a possible increased risk of some types of childhood leukemias in children living within 3 km of industrial dichloromethane releases (OR 1.64; 95% CI 1.15-2.32) [30], very similar to our results for this pollutant at 2.5 km (OR 1.65; 95% CI 1.11-2.45). Other authors have also found associations between benzene exposure and childhood risk of acute lymphocytic leukemia [31][32][33], in line with our results (see Table 6). Finally, our findings about proximity to urban areas (see Table 3), as a proxy of urban pollution, are consistent with other papers [12,34,35]. With respect to proximity to environmental exposures and childhood renal tumors, the few studies focused on residential proximity to environmental pollution sources did not find associations in relation to hazardous waste sites [36] or major roadways [27]. However, some authors have found associations between children prenatally exposed to polycyclic aromatic hydrocarbons during the third trimester and risk of Wilms's tumor (the main histologic type of childhood renal tumors) [37], something that could be related to our findings about this type of pollutant (see Table 6).
Insofar as neuroblastoma and environmental exposures are concerned, Heck et al. [38] did not find associations between exposure to traffic pollution and neuroblastoma. In our study about this cancer, the excess risks found in the urban areas were not statistically significant (see Table 3). However, the same authors found increased risks of neuroblastoma with regard to a higher maternal exposure to chromium and polycyclic aromatic hydrocarbons in a radius of 2.5 km, very similar to the non-statistically excess risks found in our study (data not shown).
In relation to childhood bone tumors and proximity to industrial areas, there are few studies focused on this aspect. Pan et al. [39] found a higher mortality of bone tumors in the environs of petrochemical industries, whereas Wulff et al. [40] found an excess risk of bone cancer near a smelter. Our results about 'Organic chemical industry' and 'Production and processing of metals' yielded high excess risks (see Table 4). With respect to childhood bone tumors and proximity to urban areas, the majority of the studies existing in the literature found significant excess risks in children living in urban zones [41][42][43], in line with our findings (see Table 3). However, other authors did not find associations between proximity to urban zones and risk of childhood bone cancer [44].
As future perspectives, research is still needed on air pollution, especially in industrial and urban zones, and childhood cancer to guide policies for the reduction of emission of toxic and carcinogenic substances and protection of public health. Direct epidemiologic observation of exposed children for evaluating the magnitude of air pollution and large-scale epidemiologic studies of environmental exposures and childhood cancer are needed [45]. Moreover, surveillance systems for residential and occupational exposures, and clusters of childhood cancers should be implemented to prevent childhood cancer risk [46]. Finally, identification and control of environmental risk factors that may cause cancer in children is the single most effective strategy for cancer prevention [23]. As Nelson et al. [47] say, reducing environmental hazards associated with residential exposures could substantially reduce the human burden of childhood cancer and result in significant annual and lifetime savings.

Conclusions
The methodological approaches used by our group have proved to be very useful and flexible tools to analyze the excess risk of childhood cancers in the vicinity of industrial and urban areas, which can be extrapolated and generalized to other cancers and chronic diseases, and adapted to other types of pollution sources.

Methods
A population-based case-control study of incident childhood cancers in Spain and their relationship with residential proximity to environmental pollution sources, in this case, industrial and urban areas, was designed. The diagram of our study is shown in Fig. 1: the first part depicts the several steps about the study subjects, data collection, and definition of the exposure, whereas the second part represents the strategies of statistical analysis used in our papers [4][5][6][7][8]10].

Study subjects/data collection/definition of exposure
Step 1 Cases, controls, industries, and towns were selected as follows: Step 2 The geographic coordinates of cases, controls, industries, and towns were geocoded and validated, as follows: (A) Geocoding strategy for cases and controls: each child's last domicile was geocoded using Google Maps JavaScript V3 [53]. The obtained latitude and longitude coordinates were projected into ETRS89/ Universal Transverse Mercator (UTM) zone 30N (EPSG:25830) coordinates using QGIS software [54], and subsequently converted into ED50/UTM zone 30 (EPSG:23030) coordinates using the R software [55]. After this, the coordinates were validated and those where the addresses and the coordinates matched were chosen. For this validation process, the inverse method was applied, getting the home addresses of the obtained coordinates and comparing these new addresses (street number and name, postal code, and city/town name) to the original addresses. Lastly, in the final ED50/UTM zone 30 coordinates of the children's domiciles, the last digit of the pair of coordinates (X, Y) was assigned randomly with the purpose of preserving their confidentiality. With respect to the cases, 87% of their domiciles were successfully validated. The remaining 13% of cases were fairly uniformly distributed through the different autonomous regions and, therefore, we declared that our data were not biased in this sense. In relation to the controls, initially, only 2% of their addresses could not vali- date. Owing to this small number of failures in the coordinates, we decided to select more controls to replace this small percentage and, finally, we geocoded and validated this last group to end up with six controls with valid coordinates for each case. (B) Geocoding strategy for industries: the original geographic location of each industrial facility included in the E-PRTR (longitude/latitude projection) was converted into ED50/UTM zone 30 coordinates using the R software [55], and subsequently validated following the methodology used for our group in the validation of the EPER [56], the industrial register to which the E-PRTR replaced in 2007. However, owing to the presence of errors in many of the industrial locations, every single address was thoroughly checked to ensure that the location of the industrial plant was exactly where it should be. The following tools were used: (1) the Spanish Agricultural Plot Geographic Information System (SIGPAC) Viewer [which includes topographic maps showing the names of industrial plants, and orthophotos (digitalized aerial images)] [57]; (2) Google Earth (with the street-view application); (3) the "Yellow pages" web page (which allows for a search of companies and addresses) [58]; (4) the Google Maps server [59]; and (5) the web pages of the industrial companies. (C) Geocoding strategy for towns: municipal centroids (not polygonal centroids) of towns in which the children resided were used. In Spain, these municipal centroids are located in the centers of the most populated areas, where the main church and/or the town hall tend to be located. Every single municipal centroid was meticulously checked as in the geocoding strategy for industries, using the Google Maps server [59], Google Earth, and the SIGPAC viewer [57].
Step 3 Sociodemographic variables for all children as potential confounders were selected. These variables were provided by the 2001 Spanish Census [52] at a census tract level (for their unavailability at an individual level), and included: (a) percentage of illiteracy; (b) percentage of unemployment; and (c) socioeconomic status (based on the occupation of the head of the family): it ranged from 0.46 to 1.57, where the lower value corresponded to the worst socioeconomic status and the higher values to better socioeconomic status.
Step 4 Euclidean distances between all children and industries (industrial distances) and towns (urban distances) were calculated using the R software [55].
Step 5 Finally, the "exposure" variable (in our case, the proximity to industries, according to several industrial distances 'd' , and proximity to urban areas, according to the size of the municipality) was determined. Figure 2 shows an example of exposure areas to industrial and urban sites, for an industrial distance of 2.5 km.

Statistical analysis (strategies)
Two methodological approaches using mixed multiple unconditional logistic regression models to estimate ORs were developed, using the R software [55]. For each one of the two approaches, three strategies of analysis (see Fig. 1) were implemented: (a) "Joint" analysis, where the risk of childhood cancer in the vicinity of all industries and towns as a whole was studied; (b) "Stratified" analysis, where the excess risk in the environs of industrial areas was stratified, according to: categories of industrial groups (activities) included in the E-PRTR, categories of pollutants (industries releasing groups of known and suspected carcinogens, and other toxic chemical substances), and by specific pollutant; and (c) "Individualized" analysis, where the excess risk in the environs of individually selected industrial plants was analyzed.
Potential excess risks of cancers in children living near ("near") versus those living far ("far") from industrial and urban areas were assessed, comparing the ratio between the number of cases and controls in zones close to industrial/urban areas and number of cases and controls in zones far from these pollutant sources (OR near vs. far ), and adjusting by potential confounders. Five "near vs. far" analyses were performed (see Fig. 1