Ideally, contextual analysis allows for consideration of both attributes that are generalizable across multiple settings, and geographically referenced relationships – influences that occur in context with each other. However, a tension exists between geographic variation analysis, which identifies the location and nature of the variation, and non-spatial analysis, which may identify characteristics of environments or individuals associated with variation, but does so without spatially specific models.
Spatial variation in disease characteristics occurs, and multiple statistical methods have been developed to determine whether patterns of variation occur by chance alone, or whether variation is unlikely to have happened at random. One type of variation analysis is cluster detection analysis, which specifically examines geographic clustering – spatial groups of outcomes that are statistically unlikely to occur by chance alone, given the overall distribution of the outcome of interest across the entire space being examined. Examples might be the occurrence of the disease itself [2, 3], or distributions of factors of interest, such as characteristics of the disease, intermediate events such as extent of the disease at time of diagnosis  and receipt of certain treatments , or outcomes such as mortality related to the disease . However, if clustering of an outcome is identified and determined to occur non-randomly, there is still little information on which to act, because the reasons for these clusters remain hidden.
Conversely, conventional non-spatial analysis methods may be used to identify important influences on individual or area-level disease variation. For example, hierarchical or multilevel regression can be used to simultaneously examine individual and area-level characteristics which are associated with variation in disease incidence, characteristics, or outcome . However, these methods usually consider areas as discrete, even when they are contiguous, without examining relationships between the largest units of analysis. If areas in analyses are geographically related, after building multilevel models, it is still necessary to examine the data for spatial dependence, and to determine whether the model fully accounts for geographic patterns, or whether there is remaining unexplained variation that is spatially dependent – including, but not limited to, geographic clustering.
The study of disease patterns in prostate cancer, for example, can be informed by geographic analyses. Prostate cancer is a disease with strong geographic variation, both internationally and also within individual countries or regions . Like most cancers, the development of prostate cancer typically occurs over a long period of time. Both age of onset and disease course vary enormously, but it has been demonstrated through autopsy study that most men will develop some degree of prostate cell abnormality in older age. It is likely that many factors contribute to its development; from inherited genetic risk, to lifestyle patterns in diet, use of substances such as tobacco and alcohol, exercise and body size and composition, to environmental exposures to a range of protective and detrimental agents . Furthermore, although much is still unknown about prostate cancer etiology and development, there is sufficient information to argue that prostate cancer is most likely caused by a complex combination of factors, rather than a single explanatory risk. Beyond simple incidence, outcomes such as stage at diagnosis, tumor biology and histologic grade, receipt of standard-of-care treatment, and high quality survivorship are also geographically patterned.
When considering the utility of a geographic approach to prostate cancer influences, it may be useful to think of three broad categories of factors. There are factors which may be, at first consideration, purely non-geographic in influence. An example of this might be the influence of the biological characteristics of the cancer on the disease course, such as the relationship between histologic grade of tumor on the stage or extent of disease at diagnosis . This relationship is considered important and tumor characteristics such as grade are almost always included when modeling outcomes. Yet we can consider this influence to be relatively non-geographic, because we might speculate that this relationship does not change under local geographic influences.
Other factors, such as age, might be considered to be pseudo-geographic in influence. The age distribution of the male population would vary across almost any geographic area under consideration, and there is also a strong age-disease relationship in prostate cancer, with the risk of the disease increasing with age. However, the age-disease relationship is not likely to be primarily driven by geography. Adjusting for the distribution of age within a population of interest is often desirable, in order to remove the confounding caused by age, and simulate the geographic variation we would expect to see if we had populations with identical age distributions.
A third and more complex category of influences are those for which geographic context is critical to their causal pathway, and thus these variables may be only partially understood outside of their geography. Examples might be individual social or behavioral characteristics such as ethnicity or race, income, insurance or education, occupation, diet or body size.
For example, the consistently greater risk for prostate cancer among men of African ancestry compared to all other ethnic groups in the world suggests fundamental biologic causes that supersede geographic influences. However, substantial geographic variation within the US African-American population, as well as international variation between African, Afro-Caribbean, and US men of African ancestry suggests complex multigenerational social and geographic influences .
Even influences that we may confidently classify as so fundamental as to be geographically immutable, such as the relationship between tumor biology and disease progression, could be influenced by geographic variation in access to care or medical practices, dietary, occupational, or environmental agents, or individual variation in behaviors such as tobacco use, exercise, or body size. Therefore, the extent to which any factor's influence on a cancer outcome varies by context or location offers tremendous insight into the mechanisms of influence.
The purpose of this research was to combine cluster detection analysis techniques with multilevel modeling of area-level influences on disease patterns, in order to examine the relationship between social-environmental influences and spatial patterning. We used data from the Maryland Cancer Registry on incident cases of prostate cancer occurring in Maryland from 1992 to 1997, and examined variation in two disease characteristics which contribute significantly to overall disease burden: histologic grade of tumor, and stage of disease at time of diagnosis. The use of geographic analysis of prostate cancer outcomes of interest, in combination with modeling of known risk factors, may prove useful in understanding how much of the strong geographic patterns in prostate cancer can be explained by individual and area-level influences, and how much remains, as of yet, unexplained.
For each of our two outcomes of interest, higher tumor grade and later stage of disease at diagnosis, we first modeled the "crude" or unadjusted variation in these outcomes across the entire State. This was done by calculating a block group-specific expected rate of each outcome, based simply on the number of cases within the block group and the overall rate of the outcome across the State, and comparing the ratio of observed to expected cases with the given outcome at the block group level. We then used estimates from multivariate models to refine our estimates of the expected number of higher grade or later stage cases, and recalculated, at the blockgroup level, the ratio of observed to expected cases with the outcome of interest. Throughout each set of three analyses, the observed number of cases remained the same, and the expected number (the denominator) varied with each adjustment. Therefore, if an independent variable in a regression model was positively associated with excess risk for the outcome of interest, it increased the regression-estimated expected number of such cases, and thus decreased the observed-to-expected ratio in areas where it was observed. Factors which were negatively associated with risk for the outcome, when adjusted for, reduced the number of such cases expected, and, in turn, increased the observed-to-expected ratio. The methods used are explained in greater detail in the methods section.