Defining Socially-Based Spatial Boundaries in the Region of Peel, Ontario, Canada
© Drackley et al; licensee BioMed Central Ltd. 2011
Received: 19 January 2011
Accepted: 21 May 2011
Published: 21 May 2011
Skip to main content
© Drackley et al; licensee BioMed Central Ltd. 2011
Received: 19 January 2011
Accepted: 21 May 2011
Published: 21 May 2011
The purpose of the project was to delineate a series of contiguous neighbourhood-based "Data Zones" within the Region of Peel (Ontario) for the purpose of health data analysis and dissemination. Zones were to be built on Census Tracts (N = 205) and obey a series of requirements defined by the Region of Peel. This paper explores a method that combines statistical analysis with ground-truthing, consultation, and the use of a decision tree.
Census Tract data for Peel were derived from the 2006 Canadian Census Master file.
Following correlation analysis to reduce the data set, Principal Component Analysis was applied to the data set to reduce the complexity and derive an index. The Getis-Ord Gi*statistic was then applied to look for statistically significant clusters of like Census Tracts. A detailed decision tree for the amalgamation of remaining zones and ground-truthing with Peel staff verified the resulting zones.
A total of 15 Data Zones that are similar with respect to socioeconomic and sociodemographic attributes and that met criteria defined by Peel were derived for the region.
The approach used in this analysis, which was bolstered by a series of checks and balances throughout the process, gives statistical validity to the defined zones and resulted in a robust series of Data Zones for use by Peel Public Health. We conclude by offering insight into alternative uses of the methodology, and limitations.
Independent of individual characteristics, it is recognized that an individual's immediate environment possesses both material and social characteristics that are linked to health status as well as health-seeking behaviours [1–3]. That is, health reflects both individual characteristics, as well as the characteristics of the neighbourhood which constrains and enables individual health. For example, neighbourhoods may provide important information and support with regard to health practices and behaviours, but may also be associated with poor health in cases where crime is higher or the physical environment is poorer . Concurrently, there is a common need for health status and related data to be represented at the 'neighbourhood' scale, whether it is for the provision of social welfare programs, planning, or health care delivery.
Geographers have long been concerned with defining neighbourhoods and places, and examples of techniques to define neighbourhoods abound in the academic literature [see, for example: 2, 3, 5-10]. Weden et al. , for example discuss the evolution and theoretical foundations, including links to public health issues, associated with neighbourhood classification, starting with the Chicago School. However, there are many approaches to defining zones, ranging from simple cases that are based on existing or historical neighbourhoods, school catchments zones, and communities, to more complex approaches including hierarchical clustering and scale-space approaches [see, for example, 11-13]. But even the so-called 'simple' cases can have fuzzy boundaries that are not agreed upon by residents and authorities alike, and new suburban communities may not self-identify as a cohesive neighbourhood, meaning that how areas are defined has been approached differentially based on the application. Most use various measures, such as poverty or educational attainment, that are derived from statistical organizations (such as Statistics Canada or the US Census Bureau), and represent a proxy for health outcomes. For example, the City of Toronto's 'Major and Minor' Health Planning Areas used the proportion of the population living in low income at the Census Tract level . In Scotland, the identification of data zones was based on the Townsend Deprivation Index . The City of Ottawa, Canada, analyzed physical and demographic characteristics of neighbourhoods through the so-called 'wombling technique' that analytically grouped areas based on statistical similarities, with the results providing an approximation of neighbourhoods [8, 14]. They then went on to use a combination of ground-truthing, spatial analytical techniques, and GIS to define neighbourhoods. However, the wombling technique itself may be subject to validation inconsistencies based on the starting point of the analysis. Similarly, the use of simple, additive structures or the reliance on one particular population attribute to identify similar areas has also been criticized.
Despite attention and numerous papers on the topic, there is no one ideal (or recognized) way to define neighbourhoods and their spatial boundaries, and a lack of consensus remains as to the empirical definition of neighbourhoods. Often times, however, zones are constructed to reflect or identify differences in health across space [i.e., 1-3, 6-8]. But health is defined by more than just personal health and access to health care services. For example, the Determinants of Health framework [15, 16] - which represents a synthesis of public health and social science literature and includes issues such as lifestyle options (i.e., drinking, smoking, physical activity), nutrition, housing, work, education, income, as well as mechanisms related to societal power, social identity, social status and control over life circumstances are influential in the distribution of health - suggests that these various place-based effects influence health at the neighbourhood scale [4, 17]. Since they can be used to help contextualize and define neighbourhoods empirically, as opposed to more intuitive or theoretical conceptualizations , these place-based effects have formed the core of multiple papers on neighbourhood definition.
Multivariate techniques, geographic information systems (GIS), and spatial analytical (SA) techniques further enable understanding of neighbourhoods and their geography. For instance, GIS enables the visualization of neighbourhoods, while spatial analysis and cluster detection techniques such as the Getis-Ord Gi* statistic  provide a statistically robust way to identify areas that share statistically similar characteristics by identifying clusters of census tracts with values higher in magnitude than might be expected by random chance. If such statistical techniques are coupled with expert opinion and a clear decision process on boundary placement, approaches that use a mix of techniques may provide better area-based definitions. Ultimately, these neighbourhoods can be used to further understand health (or other) differences across space, and the relationship between place and health.
The question at hand is how to appropriately define aggregate neighbourhoods ('Data Zones') in the Region of Peel, Ontario. The project was initiated by Peel Public Health, who contacted the research team in mid-2009. The overall purpose of the project was to delineate a series of contiguous Data Zones within the Region for the purpose of health data dissemination. The use of the term 'Data Zones', as opposed to neighbourhoods, was preferred, since neighbourhoods typically have some degree of social identification associated with them and are frequently geographically smaller than the areas that would ultimately be identified in this project. The desired outcome, as requested by Peel Public Health, was to accomplish the following three goals:
• Develop a methodology for defining Data Zones within the Region of Peel while accounting for sociodemographic and socioeconomic effects;
• Use the Data Zones to describe selected health issues and outcomes across space;
• Analyze and report findings, such as the differences in health outcomes between spatial areas.
The resulting Data Zones are not intended to facilitate the delivery of services, but to identify relationships between inequalities in neighbourhoods and health disparities, with Peel Public Health using the zones as a communications vehicle; for reporting to people who have an interest in certain geographic areas; for planning purposes at the strategic level; and for following relevant trends over time.
The research team was therefore charged with developing a methodology to delineate internally homogenous Data Zones using geographic data with relevant software based on the 2006 Census, and using Census Tracts as the existing boundaries from which to build the zones. The purpose of this paper is therefore to illustrate a multivariate-structured technique  for the derivation of a series of Data Zones in the Region of Peel, Ontario. Following the selection of variables used to characterize and contextualize Census Tracts relative to health outcomes, GIS and spatial analysis techniques were used to map and construct Data Zones within the Region using the Getis-Ord Gi* statistic . The Gi* statistic identifies 'hot-spots' or statistically significant clusters of similar Census Tracts, providing a statistically robust definition of neighbourhoods. The delineation of Data Zones is further facilitated by a structured decision tree approach, 'ground-truthing' with staff from Peel, and the overlay of existing neighbourhoods, road and other physical landforms to ensure appropriate representation and delineation of the zones. As such, the methodology to define zones is a heuristic approach, rather than an optimization method utilized in other studies, but one that provides a robust way to define zones.
Given its proximity to Toronto, employment opportunities, and accessibility (home to Pearson International airport and served by seven '400' multi-lane, limited access highways), the region's population has grown rapidly. Between 2001 and 2006, the region grew by 17%, adding slightly more than 170,000 people to the population. Nearly 50% of Peel's population are born outside Canada (immigrants), with approximately 120,000 arriving between 2001 and 2006 alone. Large immigrant or visible minority groups (based on 2006 data) include South Asians (272,760), Filipino (42,900), Chinese (54,285), and Blacks (95,565). Other immigrant groups include South East Asian, West Asian, Latin American, Japanese, and Korean communities. Much of this new population is housed in new, low density suburban style development. Approximately 46% of the region's population report a non-English/non-French mother tongue. The median after tax income in Peel (2005) was greater than that of the overall province ($Cdn62,181 versus $Cdn52,117, respectively), and has a generally well-educated population, with 34% of the population aged 25 to 34 having a certificate, diploma, or degree .
Peel has a rich geography that is defined through multiple existing neighbourhoods or service planning boundaries, including older communities that continue to retain their identity and electoral boundaries. Existing planning and service delivery areas include Peel's 'Family of Schools' areas (used by the Peel District School Board), Forward Sortation Areas (used by Canada Post and as a basis for the Social Planning Council of Peel's 'Portraits of Peel'), Local Health Integration Networks, and Community Health Centre boundaries. Additionally, the Region is divided into a number of statistical zones, including Census Tracts (N = 205), which are small, relatively stable geographic areas that typically have a population of 2,500 to 8,000, and dissemination areas (400 -700 people), both of which are defined by Statistics Canada.
The purpose of this work was how to express these varied geographies and summarize the diverse sociodemographic and socioeconomic profiles of the Region. In the first instance, Census Tracts were used as the building blocks for the Data Zones given stated preferences by Peel Public Health and ease of data availability at this scale. In the second instance, and following a review of the relevant literature [i.e., 8] and requests by Peel Public Health, a set of variables were initially considered for inclusion in the analysis that the research team and Peel staff felt expressed Peel's diversity. Variables requested for consideration by Peel included: % with no knowledge of English or French; % aged 25+ years who completed less than high school; % recent immigrants; and % low-income population, all of which are used in comparable studies. All variables were derived from the 2006 Census and based on the 20% Master data file from Statistics Canada. In addition, the research team suggested a number of other variables linked to population health outcomes, including % unemployed, % visible minority, % labour force aged 15+, and average number of persons in a household. Other variables initially considered in the analysis included alternate measures of income (i.e., median income and after tax income).
Variables included in Principal Components Analysis
% Owner households spending 30% or more of household income on major payments
% Households in need of major repairs
% Aged 20+ with no High School
% Unemployed (Prior to May 16th, 2006)
% Low Income (Before tax, 2005)
% No Knowledge of English or French
% Separated or Divorced
% Recent immigrants (Immigrated to Canada between 2001 and Census Day, May 16, 2006. (Census, 2006)
% Lone Female Parent Family
In defining Data Zones within the Region, Peel Public Health requested that the following issues be considered:
Approximately 12-14 Data Zones were to be defined, with populations of approximately 80,000 to 100,000. The exception to this request was in the northern portion of the Region of Peel (the community of Caledon), which is predominantly rural and therefore has a smaller population density. Peel reserved the right to redefine the population threshold for zones following the initial analysis;
Data zones were to be contiguous, follow Census Tract boundaries, and avoid cases where zones were encircled by other zones;
Data zones were to follow boundaries that correspond to areas of interest for other purposes;
Data zones were to focus more on the composition of the local population when defining neighbourhood boundaries, rather than neighbourhood context [i.e., 13, 31, 32];
Where plausible, it was requested that the Data Zones respect natural and human-made boundaries, such as rivers and highways. Several such barriers exist within the Region of Peel, including rail lines, the Credit River which extends northwest through Mississauga, and limited-access highways including highways 401, 403, 407, 410, and the Queen Elizabeth Way (QEW) which dissect the region. In most cases, census tracts already follow these boundaries. In cases where they do not, census tracts must still form the boundary.
In practice, it was not possible to satisfy all these criteria, and compromise was necessary. Most commonly (and as noted below), population constraints were waived in consultation with Peel staff given future growth anticipated growth trends.
Differences in perceptions and definitions imply that neighbourhoods mean different things to different people (see Luginaah et al.  for a review). Although there is disagreement in the literature concerning the best way to capture the concept of a neighbourhood, Census Tracts provide one option. At the same time, the use of Census Tracts have been frequently been criticized because their statistically defined areas impose boundaries may not necessarily be related to other social processes or perceptions of what a neighbourhood includes, reducing the power of a neighbourhood as a meaningful concept [37–40]. On the other hand, other studies argue that Census Tracts are good proxies of neighbourhoods [1, 3] as compared to socially constructed areas, which are often loosely defined and lack the ability to link to other statistical data. Indeed, the comparison of several neighbourhood units of analysis suggests that Census Tracts are good proxies for natural neighbourhood boundaries in studies of neighbourhood effects on health . Moreover, defining neighbourhoods by using Census Tracts (or groups of Census Tracts) offer a number of advantages, including direct linkage to statistical measures provided by Statistics Canada.
Following the initial selection of the variables, principal component analysis (PCA) with a varimax orthogonal rotation was used to summarize variables and build indices, a practice commonly used to consolidate information along main dimensions and that has been widely used in defining zones similar to the aims of this work [i.e., 8, 41, 42, 43]. While other zoning exercises have constructed an index based directly on the weighted variables, indices constructed in this way may be misleading by missing inter-relationships between variables, and/or fail to account for a more complete set of potential indicators. The central idea of PCA is to reduce the dimensionality of a data set which consists of a large number of interrelated variables, while retaining as much as possible the variations present in the data set , allowing the determination of which tracts could be combined to form relatively homogenous areas. PCA allows for the extraction of components that reflect the pattern of the inter-correlations of the variables, while searching for commonalities. Only factors that contributed greater than 10% of the variation would be retained for further analysis.
While PCA assists with the identification of the sources of variation, it does not help in understanding the spatial patterning of the components. Following PCA, therefore, the next step was to create the boundaries for the zones based on the PCA scores assigned to each Census Tract for each factor created by PCA. For this purpose, a Getis-Ord Gi* hot-spot analysis  was run on the resulting sets of Factor Scores. The statistic works by looking at each tract within the context of neighbouring tracts: if a tract's value is high (low), and the values for the neighbouring tracts are also high (low), it is a part of a so-called 'hot spot'. For each PCA factor, the Gi* statistic identifies the association between a Census Tract and its neighbours up to a specified distance, or in terms of nearest neighbours where the CT shares a boundary. The Gi* statistic is well-suited to identify the existence of pockets or clusters of areas (tracts) with values higher in magnitude than might be expected by random chance and their statistical significance; to assess assumptions of stationarity (i.e., that spatial relationships are the same at all places in the study area); and to determine distances beyond which no discernible spatial association exists . Importantly, the Gi* statistic identifies clusters that can be used to statistically delineate zones. The output of the Gi* function is a z-score for each feature, with the z-score representing the statistical significance of clustering for a specified distance, and the higher (or lower) the z-score, the stronger the association. A z-score near zero indicates no apparent concentration.
Final Component Eigenvalues
Varimax Rotated Variable Rotation Correlations
Rotation Correlations (Structure)
% Separated or Divorced
% Households in need of major repairs
% No English or French
% No High School
% Low Income (Before tax, 2005)
% owners spending 30% or more
% Recent Immigrant
% Single Mothers
Once the Gi* was computed and mapped for both PCA components, Data Zones could be delineated. As a first step, groups of Census Tracts that were statistically significant for either of the mapped components (low socioeconomic status and single renter) became the building blocks for a Data Zone. That is, there is statistical robustness based on the Gi* statistic for grouping these zones based on their similarity.
The general logic of the decision tree was based on two general streams. In cases where the Gi* analysis identified hot-spots, these clusters were compared with known hard boundaries such as roads or other features, and checked to ensure that they met other criteria such as population size. For the remaining portions of the Region that needed to be defined (i.e., those areas that were not defined as clusters by the Gi* statistic), we first turned to the DMTI Neighbourhood and Community Boundaries file , a "continually updated" set of neighbourhood boundaries as determined by "amalgamating and integrating information from municipal data sources" . These DMTI-based boundaries were over-laid with the initial zones, allowing Data Zone boundaries to be initially constructed based on known neighbourhoods, while referencing population counts for each potential zone and ensuring that the constructed zones remained contiguous.
Once the initial set of contiguous zones was generated, a physical approach was used to refine zonal boundaries through two 'ground-truthing' methods. First, we referenced known boundaries, including physical features such as highways and streams, to determine if more 'natural' boundaries separating zones might be warranted, echoing Pickett and Pearl's  call for meaningful neighbourhoods that are based on natural boundaries. In this case, we assumed that such barriers differentiate areas through the physical division of space, such as separating neighbourhoods so that there is reduced interaction, or by separating places with different socioeconomic and sociodemographic profiles. Consequently, natural boundaries serve both functional purposes such as transport or recreation, as well as creating barriers between different groups [e.g., 47, 48]. Roads and highways were obtained from DMTI's Route Logistics file (2008), which contains highways and roads for all of Canada, albeit clipped to the boundaries of the Region of Peel . Visible land features were obtained from the Satellite Streetview Orthophoto dataset created by the 60cm resolution Quickbird Satellite and released by DMTI Spatial . The result is a spatial file with the different overlays (zones, neighbourhoods, roads, physical landforms), along with the zones delineated by the Gi*statistic, neighbourhood boundaries, and other "hard boundaries" (i.e., transportation) in Peel. Comparison of these boundaries identified any anomalies through consideration of both land features and physical boundaries. Throughout, total population counts for each potential zone were verified. The resulting 'shape' of the derived Data Zone was not an issue in the analysis owing to the imposition of the various constraints - statistical significance from the Gi* statistic, number of derived zones, known boundaries such as roads or physical features, and population counts - meant that any attempt to constrain the shape of the Data Zones was less meaningful. A total of 13 zones were identified at this stage of the analysis.
Second, we presented the results to Peel Public Health for their expert input on the defined boundaries. Peel staff, including GIS technicians, planners, and public health officials participated in two round-table discussions where interim results were presented. Through their more detailed knowledge of current and future population trends, socioeconomic profiles, and development within the region, participants critically analyzed the methodology and outcomes, and commented on potential anomalies or disagreements with the resulting divisions. These exercises resulted in the division of Caledon to create Data Zone 13 (West Brampton) and 15 (Bolton) at the request of Peel staff. In the first instance (zone 13), Peel's Official Plan notes the short-term housing and commercial development of the West Brampton area, with rapid population growth expected within a five-year window. Although the area was still largely rural (as of 2010) and therefore more similar to Caledon, the imminent population growth and development meant that Peel staff felt it was more suitable to present it as a separate zone rather than amalgamate with Caledon as suggested by the statistical analysis, enabling future flexibility with the zones. In the second case, the community of Bolton (Data Zone 15) was separated from the northeast portion of Brampton, again reflecting the uniqueness of the Bolton area (relative to the rural areas immediately around Bolton), and the potential for substantial short-term population growth, even though its 2006 population count (22,719) also falls below the threshold originally suggested for the definition of the zones. While counter to the initial constraints (namely that population thresholds for the two new Data Zones were less than the minimum size initially requested by Peel, meaning the population of the zones was not equitably distributed across each zone) and the clustering results, Peel staff felt that these modifications better provided for the future growth of Peel's population and more consistent zones over the longer-term. In addition to consultation with Peel staff, Peel also used the final derived Data Zones to produce maps of various health outcomes as an internal check of their validity.
Through a series of mixed methods, a set of Data Zones were delineated for the Region of Peel, Ontario, based on existing Census Tracts. It is hoped that as further census and health outcome data becomes available, and given that Peel's population continues to grow and become more diverse, the delineated zones can be verified and refined for future analyses.
The approach used in this paper is flexible and bolstered by a series of checks and balances throughout the process, including the use of statistically defined clusters of like Census Tracts through the use of the Gi* statistic, giving statistical validity to the defined zones. In addition, the use of a formal 'decision tree' to assist in the determination of zones, along with the recognition of local community boundaries, physical land features such as major roads or landscapes, and the knowledge of local health experts, resulted in a robust set of Data Zones for use by Public Health in the Region of Peel. Consequently, the methodology to define zones illustrated in this paper draws upon a number of inputs, with the end result a more robust and meaningful set of zones.
The methodology has a number of advantages, enabling it to be applied elsewhere and in different contexts. First, the method can be adjusted based on the desired output such as size/number of zones, their constituent building blocks, or even the inclusion (exclusion) of the initial statistical steps such as PCA. For example, the PCA analysis could be removed as a step. Instead, input for the Gi* analysis could, for example, be based on other existing inputs (i.e., an individual variable such as low income status) or indices, such as the UK's Townsend Index of Deprivation [9, 51]. Following the Gi* analysis, which would identify clusters of like areas based on these alternate inputs, statistically similar zones could once again be identified through the use of the decision tree, consultation, and expert opinion. Second, the approach is applicable to both research and practical applications such as health surveillance. Third, the approach can be scaled up or down to other geographical contexts. Fourth, the consultative process and use of ancillary data removes concerns that the zones are only representative of the statistical process and the building blocks (Census Tracts) that underlie the zones. In essence, the proposed methodology increased participation in the analysis, and ultimately improved the definition of the resulting Data Zones to reflect local knowledge.
At the same time, the practice of using aggregated spatial data as the basis for creating larger areal units is a technique associated with potential errors, biases, and oversights - regardless of the context or application. First, the creation of socially-based spatial aggregations can be used to misrepresent those living within an area, either intentionally as in gerrymandering political districts to subdivide sizable voting populations, or unintentionally through irresponsible analysis. Caution must therefore be exercised in the use of expert opinion. Consequently, the decision tree is an important component of the work, providing a platform from which to evaluate changes to the set of zones.
Second, although Census Tracts were requested by Peel Public Health to be the building blocks for the analysis, there are reliability issues with using such a large spatial area as the building block for even larger Data Zones. It is recognized within spatial science that as an aggregated area increases in size, the recognized variance of the characteristics of the population within the area declines . By generalizing the characteristics of a population with some kind of areal unit, potentially important variances within the defined zones are hidden. By using Census Tracts as opposed to smaller dissemination areas (for which the same census information is available), important variations in the population composition of the Region may be over-looked. The modifiable areal unit problem typifies this [52, 53], reminding researchers that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating" [53, p. 102]. Because of this, whenever attempting to subdivide an area based on the assumed similarities of those living there, care must be taken to ensure that the generalized areas most accurately represent the people living within their borders, maximizing the differences between units, while minimizing the differences within them . Similarly, the use of a fixed distance band with the Gi* statistic, while useful in the urban portions of Peel, may be somewhat less relevant in the rural (northern) portion, again potentially altering the definition of the Data Zones. In other words, it is important to realize that the processes that create population clusters are unlikely to operate at only one geographic scale, but are instead shaped by complex interactions. Consequently, further work may look at the strengths and weaknesses of the proposed methodology.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.