 Research
 Open Access
 Published:
The spatial structure of chronic morbidity: evidence from UK census returns
International Journal of Health Geographicsvolume 15, Article number: 30 (2016)
Abstract
Background
Disease prevalence models have been widely used to estimate health, lifestyle and disability characteristics for small geographical units when other data are not available. Yet, knowledge is often lacking about how to make informed decisions around the specification of such models, especially regarding spatial assumptions placed on their covariance structure. This paper is concerned with understanding processes of spatial dependency in unexplained variation in chronic morbidity.
Methods
2011 UK census data on limiting longterm illness (LLTI) is used to look at the spatial structure in chronic morbidity across England and Wales. The variance and spatial clustering of the odds of LLTI across local authority districts (LADs) and middle layer super output areas are measured across 40 demographic crossclassifications. A series of adjacency matrices based on distance, contiguity and migration flows are tested to examine the spatial structure in LLTI. Odds are then modelled using a logistic mixed model to examine the association with districtlevel covariates and their predictive power.
Results
The odds of chronic illness are more dispersed than local age characteristics, mortality, hospitalisation rates and chance alone would suggest. Of all adjacency matrices, the threenearest neighbour method is identified as the best fitting. Migration flows can also be used to construct spatial weights matrices which uncover nonnegligible autocorrelation. Once the most important characteristics observable at the LADlevel are taken into account, substantial spatial autocorrelation remains which can be modelled explicitly to improve disease prevalence predictions.
Conclusions
Systematic investigation of spatial structures and dependency is important to develop modelbased estimation tools in chronic disease mapping. Spatial structures reflecting migration interactions are easy to develop and capture autocorrelation in LLTI. Patterns of spatial dependency in the geographical distribution of LLTI are not comparable across ethnic groups. Ethnic stratification of local health information is needed and there is potential to further address complexity in prevalence models by improving access to disaggregated data.
Background
The spatial distribution of chronic morbidity at a subnational level attracts considerable policy interest with relevance for health inequalities, health care planning, and resource allocation. Yet, information on the spatial distribution of morbidity is typically scarce with researchers often reverting to data on mortality or using data on health service use. Intelligence on the small area population prevalence of morbidity has tended to focus on cancer incidence and mortality [1, 2], cancer risk factors and screening uptake [3], the prevalence of longterm conditions [4, 5], healthy lifestyles and behaviours [6, 7]. There has also been interest in measuring geographical variations in health needs [8, 9] and underdiagnosis of longterm conditions [10].
The challenges involved in developing small area measures of morbidity have led to a range of techniques known as small area estimation. Modelbased approaches to small area estimation rely on the premise that a chosen statistical model accurately predicts the odds of illness for the entire population. They raise a series of challenges in terms of validity. In the absence of systematic procedures guaranteeing optimal model specification and selection, there is a risk that this modelling process will be illinformed, introducing bias in the resulting estimates. Reviews have argued that assumptions around the treatment of spatial effects introduces a particular element of subjectivity [11, 12, p. 87].
The objective of this paper is to assess spatial dependence between small geographical areas for chronic morbidity. We analyse the geographical distribution of limiting longterm illness (LLTI) across England and Wales, focusing on the spatial structure in morbidity both with and without controls for confounders (mortality and hospitalisation rates). We consider global and local autocorrelation statistics for three types of dependence structures: contiguity, nearest kneighbours and a novel approach building a spatial interaction matrix using origindestination migration flows. Our analyses are stratified by ethnicity to isolate differences in the spatial structure of morbidity across different population subgroups. This results from existing interest in monitoring health inequities across ethnic groups. It is currently unclear from the literature how homogenous the spatial structure of morbidity is across ethnic groups, especially given the complex interaction with existing processes of residential segregation.
The following background section gives a review of existing knowledge on spatial aspects of health determinants, to inform model selection. Aims and methods are then outlined, with a particular emphasis on concepts used to describe spatial structures. A results section then presents both descriptive statistics and modelbased analyses of the geographical distribution of LLTI, introducing mortality, hospital admissions and adjacency matrices as predictors of this structure. The paper then concludes by identifying implications for the routine prediction of morbidity prevalence for different geographical units.
Existing knowledge on the spatial structure of chronic morbidity
Much of what is known on the distribution of chronic diseases comes from data on validated selfreported health statuses. LLTI has emerged as a very strong predictor both of chronic morbidity and mortality [13–15]. It has also proved instrumental in measuring health inequalities both across socioeconomic categories and space [16–18]. LLTI has been recorded since 1991 in UK decennial censuses in the form of a question asking whether respondent’s daytoday activities were reduced by a health problem or disability. This information has supported important research into the determinants of health care needs of different populations in different places [19, 20].
The literature provides some information regarding ecological determinants of chronic morbidity and their spatial structure. Analyses have showed that, even once population age and essential demographic confounders are controlled for, adjusted morbidity levels correlate significantly with local socioeconomic characteristics [17], and the remaining betweenarea heterogeneity is spatially structured [21]. To examine these ‘place effects’, Bentham et al. [22], Martin et al. [23], Senior et al. [16], Shouls et al. [21, 24], Congdon [4, 25, 26] and Stafford et al. [27] have all investigated the association of LLTI prevalence with both individuallevel characteristics and arealevel contextual variables. Their work has showed that local mortality, unemployment, household overcrowding, ethnic diversity, social renting, proportions of workers employed in mining and other heavy industries all correlated strongly with standardised ratios of LLTI. These confounders often prove to be similar in places that are near to each other (for instance across urban areas), pointing to distinctive underpinning spatial structures.
A variety of processes have been hypothesised to explain this apparent clustering of longterm conditions across places. On the one hand, it is the case with many health outcomes that a residual spatial pattern can subsist even once observable risk factors or confounders are taken into account [28]. On the other hand, research has argued that population migration not only determines the dispersion of communicable disease, but also provides one of the factors driving the spatial clustering of chronic morbidity. The literature has in particular examined ‘health selective’ residential migrations as life course processes of selection [29]. Boyle et al. [30] have produced evidence that Scottish migrants tend to be healthier than nonmigrants, and that healthy migrants are likely to travel longer distances. Further evidence supporting the theory of a ‘sorting’ effect of migrations on health has been presented by Norman et al. [31], emphasising the existence of a strong flow of healthy migrants aged 20–59 years towards areas with lower levels of material deprivation. A review by Smith & Easterlow [32] argues that the influence of residential mobility processes on geographical inequalities in morbidity and mortality remains little understood, with mixed results depending on the geographical level of analysis and the health outcomes under consideration. Despite the absence of clear evidence claims for the health sorting effects of migration point to a need to consider how we might use migration data to capture some of the spatial structure in morbidity in a way that proximity may not.
All the above evidence has implications for disease prevalence models. In its most elementary form, modelbased small area estimation fits a model predicting the probability of having a given illness as a function of age, sex, and other individual characteristics. This model is then applied to local population estimates and auxiliary data known for every individual residing in a catchment area in order to produce a local prevalence estimate. This amounts to interpolating prevalence levels known at the national level to local populations using a combination of:

(a)
fixed individuallevel risk confounders

(b)
spatially varying arealevel confounders

(c)
residual unobserved risk (betweenarea residual heterogeneity in prevalence)
This last component (c) is essential and explains the popularity of multilevel health models in recent decades [33], being one of the preconditions to the model’s unbiasedness. Residuals capture local departures from the overall average which signals, for instance, excess morbidity. This random component avoids assuming for instance that all persons aged 16–24 years have the same prevalence across all areas. This component is difficult to estimate because sample data will typically be small, often well under a few dozen cases. More importantly, the underpinning method assumes that these residuals are independent from one another and often ignores the fact that spatial dependence may persist. Recognising underlying spatial structure makes it possible to borrow information from other areas in order to estimate these components in a more efficient manner (see for instance simulation results by Praseti & Salvati [34]).
More research is needed to understand spatial dependence. Spatial structures have previously been described as the result of ‘the operation of processes in which spatial relationships enter explicitly into the way the process behaves’ [35, p. 24]. They are often understood as functions of distance or spatial adjacency (neighbours). The science of spatial autocorrelation has largely been dominated by Tobler’s First Law of Geography, summarised as ‘everything is related to everything else, but near things are more related than distant things’ [36]. Contiguity methods, such as Queen, Rook or Bishop, and the knearest neighbours method have traditionally been privileged. Although this standard approach is appealing, there are many more ways in which spatial interaction could be defined. In particular, origin/destination migration flow statistics constitute additional evidence of processes of spatial interaction and therefore betweenarea dependence. Although using such flow metrics to produce spatial weights has been envisaged before [37, p. 271], they have, to the best of our knowledge, not been applied to empirical investigation to date.
Internationally, most research has tended to demonstrate that there is global spatial autocorrelation in many health outcomes even after age standardisation [38]. This autocorrelation is a sign of spatial similarity in unobserved risk factors [28]. Yet, it remains unclear whether these spatial patterns are homogeneous once we disaggregate by demographic subgroup, and add explicit spatially varying arealevel confounders.
This justifies looking further into spatial structures themselves, to inform noncommunicable disease mapping methods with a particular focus on the type of constraints placed on the treatment of residual betweenplace heterogeneity. On the basis of this background we propose to examine the spatial structures of LLTI in a more systematic way, investigating (a) what structures can be uncovered in terms of dispersion, autocorrelation, and contextual effects, (b) whether they are the same across different subgroups (age and ethnicity) and (c) whether they subsist once good arealevel covariates are introduced. We aim to address a current gap in knowledge regarding the spatial structure of morbidity in England and Wales, but also to reconsider the specification of disease prevalence models.
Methods
Data source
We use 2011 census data on LLTI for England and Wales [39]. Although the quality issues concerning selfassessed health information are well documented [40], a key advantage of using census data lies in the absence of sample size restriction. The 2011 census met a high quality 93 % person coverage rate for England and Wales [41], and thus constitutes a unique source of information to establish prior knowledge on the spatial structure of illness. Census data provide sufficient statistical power to examine modelfitting hypotheses which usually cannot be tested with survey data due to lack of power. This is especially true for small population subgroups such as older people and ethnic minorities, whose representation in health surveys is too weak in comparison to the amount of interest they attract. This reduces risks of model overfitting when using a large number of parameters. With the census coverage survey’s adjustments for nonresponse [42], the final sample size used for this analysis is n = 56,075,912.
We examine private households’ returns for question no. 23:
‘Are your daytoday activities limited because of a health problem or disability which has lasted, or is expected to last, at least 12 months? Include problems related to old age’.
Respondents were able to answer ‘Yes, limited a lot’, ‘Yes, limited a little’, or ‘No’. Throughout this paper ‘LLTI’ refers to strong activity limitations (‘limited a lot’) which has been found to have a better rate of agreement in the postenumeration Census Quality Survey [40].
The choice of indicator is justified by two main reasons. First, LLTI has become a central indicator to measure inequalities in health and health needs, to the point of being included in most UK household surveys. It underpins indicators such as the Slope Index of Inequality in health, the disabilityfree life expectancy, as well as gender and ethnicity gaps in health. These have been for a number of years to inform health service policy aiming to reduce health inequalities [43]. Several of the Office for National Statistics’ products estimate these indicators for local authorities [44, 45], and efforts have been made to publish them for smaller units [20]. Second, although selfreported, the LLTI health status correlates with important indicators of chronic conditions. In addition to being a good predictor of health service use [19], it is also a strong predictor of diagnoses as defined in the International Classification of Diseases [46], although evidence tends to suggest that LLTI tends to underestimate morbidity compared to clinical records or the more demanding SF36 tool [14].
Statistical methods
This paper aims to address gaps in knowledge regarding the spatial structure of chronic morbidity and provide evidence relevant to build small area estimation models. We explore spatial heterogeneity in the odds of LLTI at a scale for predictions to be feasible for small ethnic groups: local authority districts (LADs), areas with populations ranging from 34,000 to 1.1 million inhabitants; and middle layer super output areas (MSOAs), census geographical units averaging 7700 residents. Standard descriptive statistics are used to characterise the spatial structure in odds: variance and autocorrelation. A series of models then analyse this structure conditionally on contextual data (mortality, hospitalisations), using a typical logistic binomial parameterisation:
where \(y_{id}\) is the number of individuals belonging to a crossclassification i of gender (1, 2), age group (‘0–15’, ‘16–49’, ‘50–64’, ‘65+’), and ethnic group (‘White’, ‘Mixed’, ‘Asian’, ‘Black’, ‘Other’) reporting an LLTI in a given area d. \(n_{id}\) denotes the total number of residents of private households at risk for this same crossclassification, \(\mu _{id}\) the conditional mean logodds of having an LLTI (fixed part of the model), \(\varvec{\beta }\) a column vector of fixed effect coefficients, and \(\varvec{x}_{id}\) a vector of covariates known for all individuals: age, sex and ethnicity dummy variables, as well as arealevel characteristics tested in this paper. Random intercepts \(\upsilon _d\) are realisations of a random variable \(\varvec{\upsilon }\) of mean zero and variance \(\sigma ^2\). We add 0.5 to both the numerator and the denominator of odds to produce ‘empirical logits’, addressing bias arising from the presence of null denominators [47, 48].
Models are estimated using Laplace approximation with the R package lme4 [49, 50]. We use classical model selection techniques; likelihood ratio tests, the Akaike Information Criterion (AIC) and regression coefficient significance. During model selection, attention was also paid to \(\sigma ^2\), the variance of random effects \(\varvec{\upsilon }\), which reflects the betweenarea dispersion in prevalence that is not attributed to differences in covariates included in the fixed part. The reason why \(\sigma ^2\) is used as a decision factor is that it plays a considerable part in the efficiency of estimation [51]. Approximations of the mean squared error of prediction developed by Prasad and Rao [52] and extended to loglinear models [53, 54] show that the main determinant of prediction error is the size of \(\sigma ^2\) compared to the withingroup variance. By attempting to reduce \(\sigma ^2\) as much as possible, we focus on improving the predictive power of the fixed part. This is important when conducting small area estimation in real world conditions because residuals \(\upsilon _d\) will often be estimated with very small sample sizes and therefore subject to substantial error. A strong fixed part \(\mu _{id}\) is likely to produced better predictions overall.
Defining ‘spatial structures’
Global spatial autocorrelation is measured using the Moran’s I statistic, with a random permutation test for significance testing [55]. Local autocorrelation of regression residuals is also examined using a local indicator of spatial autocorrelation (LISA) [56] and the Moran scatterplot [57]. These are used to detect significant leverage of one set of neighbours on the global (average) level of autocorrelation, thereby signalling a cluster of high or low similarity.
Four types of adjacency matrices were tested (see Table 1). L.A and M.A follow the standard approach and were generated using the spdep package [58, 59] and boundary shapefiles [60]. They are based on the Queen method: areas were coded as neighbours in the adjacency matrix if their digital boundaries shared at least one point or if two of their respective points were separated by less than 500 m. This ensures for instance that London boroughs separated by the River Thames are coded as neighbours. The final matrix was edited manually to attach islands to mainland neighbours and verify that no area was left without neighbours. L.B k and M.B k were produced using the knearest neighbours method for k values of 2–10, with the view of determining an optimal k. All matrices were rowstandardised, a procedure that is traditionally used to ensure the positivedefinitiveness of correlation matrices in various conditional autoregressive models when spatial weight matrices are not symmetric [61].
For LADs, additional matrices L.C and L.D were built using migration flows as a proxy for spatial dependence. There are good reasons why areas further apart could be more closely related to each other given the UK’s urban and rural structure. Proximity is not the only reason why risk factors would be more alike in areas. Intranational origindestination migration data published by the Office for National Statistics [62] were used to construct spatial weights based on the intensity of flows (see \(\texttt{R}\) syntax in Additional file 1). For every LAD, we defined neighbours as the k areas from which the most migrants originate, based on the ratio of the total migrants they contributed relative to their respective population sizes. In other words, neighbours are not just those that send most migrants to a given district, they are the ones for which these migrants represent the highest proportion of their respective populations. This is to ensure a fair weighting across all LADs in the process of averaging odds of LLTI, and especially ensure that the resulting neighbours would not systematically be the biggest LADs. If a district A sent a large number of migrants to district B, but this flow in fact represented a very modest volume relative the entire population of A, it would seem excessive to use the odds of poor health of the entirety of district A as a smoothing reference for district B.
Sensitivity analyses on a subset of LADs suggested that selecting neighbours who send the highest number of migrants or those who send migrants flows which represent the highest proportion of their total population did not alter the eventual list of neighbours substantially. Further analyses (see Additional file 2) were conducted to establish whether origins and destinations differed substantially depending on the age of migrants. Results showed that excluding younger migrants did not have a strong influence on the resulting matrices. However, we hypothesised that student migrations, which are only temporary, are likely be less determinant of the structure of LLTI than other types of migrations taking place across life. Final spatial weight matrices were therefore generated exclusively based on flows for migrants aged 30 years and over.
Results
Descriptive characteristics
Overall across English and Welsh LADs, the mean odds of LLTI is \(9.23 \times 10^{2}\) (equivalent to an 8.40 % mean prevalence) with a variance of \(7.01 \times 10^{4}\), equivalent to a 28.7 % coefficient of variation. This masks huge differences across subgroups. Examining age, Table 2 suggests that the betweenarea variance in odds of LLTI among older groups is several hundred times that of younger groups. This implies that the level2 variance is expected to be higher for older age groups. Much of this effect can be attributed to the higher prevalence of LLTIs among older populations; larger odds by definition have larger variances. Coefficients of variation reported in Table 3 confirm this; relative to the average of all odds across England and Wales, the dispersion is of the same order of magnitude across age and gender groups for White populations.
This pattern differs substantially across minority ethnic groups. In the case of ethnic minorities in general, it seems that betweenarea differences in prevalence are strong for younger groups; even age groups 0–15 exhibit high dispersion in the case of categories ‘Black’ and ‘Other’. We also find higher betweenarea variance estimates at the MSOA level for these groups: while for the White group, the betweenMSOA variance in odds of LLTI is on average two to three times the betweenLAD variance, for most other crossclassifications the variance is multiplied by a factor of five to ten.
For both LADs and MSOAs, highest levels of autocorrelation are measured using the threenearest matrix \(\cdot\) .B3 (see Table 4). Similar measurements taken for higher values of k (up to 10 neighbours), not reported in the table, confirmed that increasing the number of neighbours only reduces Moran’s I estimates. Estimates for White populations show that odds for older age groups exhibit higher levels of spatial autocorrelation than younger groups. In other words, the spatial clustering of poor health is higher for older age groups. Around retirement age a final wave of intranational migrations emphasises the clustering of people by health.
Interestingly, there is no evidence of the same pattern occurring for Other ethnic groups. On the contrary, the older the individuals reporting an LLTI, the less they are found to cluster in areas. This implies that odds of poor health for ethnic minorities are not only more dispersed than those of White people; they are also less predictable or, in spatial terms, more random. None of the matrices tested in this investigation uncovered substantial spatial structure in the patterns of illness experienced by ethnic minorities, and these structures are very different from those of White populations. We hypothesise that such heterogeneity relates to the presence of stronger socioeconomic differences across space for ethnic minorities. In these circumstances, it is unlikely that borrowing strength from the structure exhibited by White populations would help make precise inferences about the health of other populations. There is more potential in using other information such as ethnic density data to reduce the variability in the model, as we show in the next section.
These descriptive estimates also provide indications regarding best fitting adjacency matrices. In the case of LADs, levels of autocorrelation measured using the ‘migration neighbourhoods’ L.C \(\cdot\) and L.D \(\cdot\) are lower than with more traditional matrices. Table 4 only reports results for rowstandardised, ranked neighbours matrix L.D3, because the specification of L.C k (binary weights) did not perform as well. In addition, sensitivity analyses found that the age categories included to generate those migration neighbourhood matrices did not have a strong influence on measures of spatial autocorrelation. More research on agespecific adjacency matrices could refine this observation.
We conclude from this exploratory work that levels of dispersion in odds of LLTIs, although comparable between sexes, are very dissimilar depending on age and ethnic groups. They may require separate treatment when it comes to their modelling and prediction. Descriptive estimates of autocorrelation provide a strong suggestion that the threenearest neighbours method \(\cdot\) .B \(\cdot\) is likely to be the most efficient since it captures highest levels of homogeneity in odds of LLTI. This finding is consistent across all demographic crossclassifications.
Modelling with covariates: area classifications and data on ethnicity
We now examine the residual geographical variance in odds of LLTI once contextual information (area classification, ethnic density, mortality rates, and health service data) is introduced in a multivariate framework. We seek to establish whether this contextual information predicts the spatial structure in residuals \(\varvec{\upsilon }\), that is to say, shrinks their variance \(\sigma ^2\). In this section, we build a series of models predicting LLTI prevalence for LADs exclusively, since they are the level at which contextual data is most commonly available. We begin by introducing some disaggregation using the 2001 National Statistics area classification of English local authorities produced by cluster analysis [63]. This allows us to treat LADs differently according to the following typology;

Cities and Services; London Suburbs; London Cosmopolitan (reference category)

London Centre

Prospering UK

Coastal and Countryside

Mining and Manufacturing.
Welsh contextual data being unavailable for LADs, a coarser specification involving a single dummy variable reflecting higher odds of poor health in Wales is retained. To reflect hypotheses of an ethnic density effect in the literature [64], the census estimate of the proportion of the district population identifying as the same ethnicity is incorporated as a covariate for those ethnic groups where such an addition improves the model fit. This is the case for all groups but Black and White populations. With other groups, this covariate improves fit as measured by the AIC and substantially reduces the betweenarea variance (−12 % for Asian and Other, −7 % for Mixed). This forms specifications for a baseline model (M0) fitted separately on data for each of the five ethnic groups (see Table 5). Arealevel residuals exhibit mild autocorrelation (Moran’s I comprised between 0.3 and 0.6).
Local mortality and hospitalisation data
We continue with contextual information on mortality and hospitalisations which are expected to be associated with some of the unobserved risk factors modelled through random effects until now. We aim to test whether this information absorbs either betweenarea heterogeneity or its spatial structure.
Existing evidence [14, 19, 65] demonstrates that individuals are very likely to report an LLTI if they have had or are about to seek a medical diagnosis. In addition, there is a wellknown association at the population level between selfreported poor health and local mortality rates [22]. Though nonlinear, this association has been exploited for small area estimation using bivariate life table models [20] and relational logistic models [66]. The bivariate response model, relevant for the data at hand, gave a particularly poor fit and was immediately discarded. Instead agestandardised mortality rates (SMRs) from death registrations [67] were transformed through Zstandardisation and used as a straightforward covariate. Models (M1) (see Table 6) result from best model selection among a range of specifications for each ethnic group separately. We compared sexspecific SMRs, overall SMRs, and interaction with gender dummies. In the case of Black populations, no association with mortality was found. Gains in terms of reduction of betweenarea variance in random intercepts are important, especially in the case of Mixed ethnic groups, where \(\sigma ^2\) is almost halved.
While mortality data does help predict local prevalence of LLTI, it arguably remains distantly related to chronic morbidity amongst the living. We compare its predictive power with that of indirectly standardised ratio of emergency admissions (SARs) for 2008–2013 [68] on the one hand, and elective admissions [69] on the other hand. This is with the hypothesis that prevalence of LLTI and rates of hospitalisation share common determinants (socioeconomic characteristics, lifelong exposure to health determinants). For all ethnicities, rates of emergency admissions are found to be associated with larger regression coefficients and improvements in fit. They are thus selected as the preferred covariate. We then test interaction effects between (a) sex and age variables, (b) mortality and (c) emergency admissions proceeding by backward elimination based on the best sets of covariates, leading us to the set of final models (M2) reported in Table 7.
Overall, for White populations, the new specifications reduce the AIC by over 9800, and the betweenarea residual variance by 45 %. The effect is less marked for ethnic minorities. The betweenarea residual variance remains stable for Black populations and is cut by about 20 % for other minorities. Emergency hospitalisations exhibit a strong association with morbidity rates and make the biggest improvement to the models. English areas with observed emergency admissions in excess of 10 % relative to the expected number of admissions (based on agespecific rates of admission for England overall) exhibit odds of LLTI for White persons aged 50–64 years on average 16 % higher compared to areas in line with England’s overall admissions rate. Models remain very different across ethnic groups and the association with admissions rates is weaker for nonWhite population. Disaggregation of admissions statistics by ethnic group could yield stronger associations in future investigations.
Residuals of each of the final models were examined in detail. Table 8 shows that residuals correlate only very weakly across ethnic categories. This constitutes further evidence that the spatial structure is specific to each of those population groups. Autocorrelation statistics confirm that accounting for differences in mortality and hospitalisation rates does not reduce the spatial autocorrelation in residuals. It reduces the random variability across LADs substantially without offsetting the extent to which deviations of a district’s odds of LLTI from the mean correlate with the deviation measured in neighbouring LADs. From the viewpoint of predictive modelling, it constitutes an advantage; introducing arealevel covariates does not reduce the potential to borrow information from neighbouring areas using relevant autoregressive model specifications. In addition, arealevel predictors did not lead to important outliers emerging which could signify local departures from the global association with mortality and hospitalisation rates. Aside for individuals from Other ethnic minorities, there is strong evidence both from normal Q–Q plots and Shapiro–Wilks tests that residuals follow a normal distribution.
Figures 1 and 2 present a series of maps of final model residuals (on the odds ratio scale), which illustrate by how much the fixed part of the model should be multiplied in order to reach the census estimate of odds of LLTI. Unshaded areas indicate predictions falling within \(+/\)10 % of the census estimate. Red shades signal LADs where the fixed part of the model underestimates odds by more than 10 % while blue shades LADs where it overestimates odds by over 10 %. With Moran’s I estimates close to 0.5, we conclude that half of the deviation between odds for a given district and the national mean is on average shared by its three nearest neighbours. Figures 3 and 4 examine the local contribution of each clique of LADs towards the global measure of spatial autocorrelation. LISAs are calculated, regressed against the model residuals and plotted for each ethnic group separately. Each of the bottom left quadrants signals statistically significant outliers in red, which can be regarded as area residuals which exhibit significant higher or lower similarity with their three nearest neighbours than average, and therefore have particular leverage of the global level of autocorrelation. Together with the maps, it becomes apparent that the chosen modelling and spatial specifications leave important clusters of unexplained risk factors, which are dissimilar across ethnic groups. The Asian model in particular exhibits a lot of heterogeneity in the strength of spatial dependence between LADs, with very strong clusters emerging for instance in parts of Lancashire, Merseyside and Yorkshire, Nottingham and Leicester, as well as North East and South West London boroughs.
Discussion
Previous work on the 2011 census has highlighted the presence of a strong spatial structure in univariate morbidity statistics [70]. Analysis reported in this paper presents a deeper examination of multivariate aspects of this spatial dependence. Descriptive estimates suggest that the variability in odds of poor health across groups and places is larger than can be expected from just looking at crude prevalence estimates. For instance, area effects are often thought to correlate strongly across age groups, as reflected in random walk priors proposed by Congdon [25]. Our analysis looking at ethnicity provides strong evidence that patterns of spatial dependency in the odds of LLTI differ substantially across ethnic groups. The covariateadjusted spatial structure of LLTI in White people only moderately correlates with that for Mixed ethnic groups. Structures of LLTI of all other groups correlate very weakly with each other. Descriptive estimates for ethnic minorities also reveal that levels of spatial autocorrelation are higher for young people, in constrast with the increased autocorrelation measured among older age groups in White people. Reasons for this difference are unclear and call for further research. One can hypothesise that cohort exposure is different for ethnic minorities and that older people reporting a minority ethnic identity have more diverse histories of exposure to risk factors, or are not affected by the same healthselecting processes of residential segregation.
The rationale for stratifying our analysis by ethnic group resides in the substantial interest in understanding variations in health care need across different population groups. Since the LLTI indicator has been used as a proxy for health care need [71], it is interesting to understand whether care needs of different ethnic groups are stable across different places. Our findings are in line with Finney’s work [72] and confirm that knowledge on patterns and determinants of local ethnic health gaps remains insufficient. Overall, disaggregation of ethnicities reveals more variation than would arise purely out of the combination of local age characteristics and chance. Our finding is also consistent with previous investigations by Shouls et al. [21, 24] relying on factor analysis to classify LADs with respect to known arealevel aggregate health estimates. Our analysis of spatial autocorrelation patterns confirms that even when accounting for other common population health measurements such as rates of hospitalisation and mortality, which we assume capture important unobserved risk factors, the significant remaining betweenarea heterogeneity still exhibits strong, almost unaffected spatial patterns in a way that is specific to each ethnic group. This is a sign of very different health needs and has been identified as an important area of current research [73].
We can draw implications for predictive modelling. In addition to measuring disparities in health needs which are not already contained in mortality and hospitalisation statistics, the distinct spatial pattern of overdispersion in the final model confirms the importance of reviewing assumptions on random effects in multilevel health models. While the assumption of spatially independent residuals may be sufficient in many descriptive epidemiology studies, it introduces risks of substantial variation and clustering in the quality of small area prediction across space, especially in the presence of underpowered sample data. This has seldom been raised as a validity issue with disease prevalence prediction models [74] though the importance of testing for the existence of significant betweenarea heterogeneity was noted by Datta et al. [75].
This paper gives a practical illustration of the implications of assuming independence of random effects across areas. In our results, the degree to which the fixed part of the model underestimates or overestimates odds of LLTI is highly dependent on error in neighbouring areas. It implies that, in the absence of sufficient individuallevel auxiliary data (e.g. from a census) or arealevel predictors (e.g. statistics on health utilisation, social or occupational characteristics), there is a greater need to explicitly model these spatial structures not just using covariate adjustments, but also incorporating spatial information explicitly into regression models. The literature has identified several routes for doing so [76]. A common stochastic approach is the use of spatial or conditional autocorrelation functions, by introducing spatial matrices into the model’s covariance structure [77]. A competing approach is the use of spatial trend surfaces (polynomial functions of the geographic coordinates) [35], or Euclidean distance matrix eigenvalues [78–80] as regressors in a standard generalised linear model.
All these techniques presuppose that the structure underpinning spatial processes in the data is well understood. In addition to reviewing existing structures defined around both adjacency and proximity (knearest method), the purpose of this paper was to test the relevance of an alternative definition of spatial structure based on residential migration. Levels of autocorrelation reached with this method indicate that while it is not the best fitting method for LLTI, it does capture a nonnegligible spatial interaction.
Incorporating spatial information explicitly into regression models requires good prior knowledge regarding the study outcome. The main benefit of using a large population source, such as the census in this paper, is to be able to conduct additional tests on local levels of autocorrelation. In our case, substantial local clusters were apparent in the residuals for the Asian model, suggesting that the range of covariates used was particularly inappropriate to predict local odds, even once global autocorrelation was taken into account. This constitutes further evidence of the need to better understand the spatial structure of chronic conditions.
Our study indicates that small area estimation remains a data intensive task. It remains difficult to predict LLTI with simple models without introducing socioeconomic information on local populations from a source such as the census [81]. Looking at betweenarea heterogeneity (Figs. 1, 2), it is apparent that geographical inequalities remain which prove difficult to predict. With these models, about 46.8 % of LADs require an adjustment of the fixed part of the model by at least \(+/\)10 % for White populations while 15.8 % require an adjustment of at least \(+/\)20 % in order to reach the actual odds of LLTI. The latter figure is of 9.5 % for Mixed ethnic groups, 15.8 % for Other ethnic groups, 33.6 % for Black populations and as high as 40.2 % for Asian populations. A reasonably large sample of data is required for every area of interest in order to reach a precise predictor of random parameters \(\varvec{\upsilon }\). This has implications for power calculations to obtain good quality empirical best predictors in presence of such residual variability. Moreover there are also questions around the properties of synthetic estimators, which only make use of the fixed part of the prevalence model, in situations where the between area residual variance \(\sigma ^2\) is not negligible. Such estimators currently underpin the majority of UK disease prevalence models. These issues point to the importance of tests for heterogeneity recently examined by Datta et al. [75] and Molina et al. [82].
While models can help produce estimates for small populations, hypothesis testing can prove limiting. The range of possible small area model specifications is virtually limitless. Shortcomings are likely to arise especially in cases where models demonstrate similar levels of unexplained variance and spatial clustering in their random part. This highlights the importance of largescales studies such as censuses in providing reliable auxiliary information for small groups.
Conclusions
The key contributions of this paper relate to (1) new descriptions of the spatial structure in LLTI both in terms of dispersion and autocorrelation and (2) implications for predictive modelling and small area estimation.
With regard to the first point, we present greater disaggregation than previous investigations in this area [21, 24, 70] and emphasise the importance of ethnicity and alternative conceptualisations of ‘spatial structure’. We provide a systematic analysis of bestfitting spatial structures and give an applied example of a new method to build adjacency matrices using migration data. Further research could examine the predictive power of disaggregating migration interaction according to demographic characteristics (age and ethnicity being strong determinants in spatial terms). It would also be worth considering spatial interaction beyond the notion of symmetry, by examining hypotheses where A being a ‘neighbour’ of B does not imply the reciprocal. Alternative approaches have proposed treating spatial weights as random parameters to be estimated rather than as fixed data [83]. This may reduce subjectivity in model specification, arguably at a certain computational and precision cost.
Our second contribution concerns the applied relevance of the paper to concerns related to predictive modelling, particularly around planning efficient small area estimation strategies. In the UK, there is sustained interest in information for small geographical areas [84], in a context where local population health surveys have almost entirely disappeared due to rising fieldwork costs and falling nonresponse [85]. Persistent, and often widening health inequalities are a concern internationally [86–88], with improvements in small area public health monitoring among major policy recommendations to tackle such problems [89].
Subnational monitoring of morbidity levels raises particular statistical challenges. Our results show that geographical variability in the odds of LLTI are greater than expected not only from sampling error and differences in local populations’ age distributions, but also in relation to levels of mortality and healthcare utilisation. Odds of LLTI also exhibit a larger betweenarea variance for ethnic minorities compared to White populations.
From a methodological viewpoint, we acknowledge limitations commonly encountered in disease mapping. In addition to data limitations themselves (insufficient disaggregation of age bands, quality issues usually expected from hospital data), this research relies on complex models. Only taking into account main fixed effects of model (M2), the number of candidate models is \(2^{12}\). When taking into account the different types of hospital and mortality covariates, the number of possibilities rises to millions. Specifications with complex random effects and spatial autocorrelation structures could in addition be considered, raising this number even higher. Overall, this concern, wellidentified in predictive modelling [90–92], represents a challenge in transparency and reproducibility of public health information.
Overall, our results emphasise the importance of detailed contextual information on population characteristics and spatial structures in the production of working models that can be trusted to hold for the whole population. Modelling techniques can be applied which make use of the spatial clustering illustrated in this paper to improve prediction. Yet, like empirical prediction, these require access to goodquality survey data with individual geographical identifiers (for instance postcode sectors) for all of the targeted small areas. In the future, geographic masking [93] may offer safer alternatives in situations where geographical identifiers are too disclosive to be released. This study also highlights the importance of local health care statistics to improve the predictive capability of models. Further disaggregation of these data sources by ethnic group and groups of medical conditions at the local level is likely to help improve future disease prevalence models.
Abbreviations
 AIC:

Akaike Information Criterion
 LAD:

local authority district
 LISA:

local indicator of spatial autocorrelation
 LLTI:

limiting longterm illness
 MSOA:

middle layer super output area
 Q–Q plot:

quantile–quantile plot
 SMR:

directly agestandardised mortality rate
 SAR:

indirectly standardised emergency admission ratio
References
 1.
Clayton David, K J. Empirical Bayes estimates of agestandardized relative risks for use in disease mapping. Biometrics. 1987;43(3):671–81. doi:10.2307/2532003.
 2.
Manton KG, Woodbury MA, Stallard E, Riggan WB, Creason JP, Pellom AC. Empirical Bayes procedures for stabilizing maps of U.S. cancer mortality rates. J Am Stat Assoc. 1989;84(407):637–50. doi:10.1080/01621459.1989.10478816.
 3.
Raghunathan TE, Xie D, Schenker N, Parsons VL, Davis WW, Dodd KW, Feuer EJ. Combining information from two surveys to estimate countylevel prevalence rates of cancer risk factors and screening. J Am Stat Assoc. 2007;102(478):474–86. doi:10.1198/016214506000001293.
 4.
Congdon P. Estimating CHD prevalence by small area: integrating information from health surveys and area mortality. Health Place. 2008;14(1):59–75. doi:10.1016/j.healthplace.2007.04.003.
 5.
Nacul LC, Soljak M, Meade T. Model for estimating the population prevalence of chronic obstructive pulmonary disease: cross sectional data from the Health Survey for England. Popul Health Metr. 2007;5(1):8. doi:10.1186/1478795458.
 6.
Malec D, Davis WW, Cao X. Modelbased small area estimates of overweight prevalence using sample selection adjustment. Stat Med. 1999;18(23):3189–200. doi:10.1002/(SICI)10970258(19991215)18:23<3189::AIDSIM309>3.0.CO;2C.
 7.
Kroll LE, Lampert T. Regionalisierung von Gesundheitsindikatoren. Bundesgesundheitsblatt  Gesundheitsforschung  Gesundheitsschutz. 2012;55(1):129–40. doi:10.1007/s0010301114031.
 8.
CarrHill RA, Sheldon TA, Smith P, Martin S, Peacock S, Hardman G. Allocating resources to health authorities: development of method for small area analysis of use of inpatient services. BMJ. 1994;309(6961):1046–9. doi:10.1136/bmj.309.6961.1046.
 9.
Gibson A, Asthana S, Brigham P, Moon G, Dicker J. Geographies of need and the new NHS: methodological issues in the definition and measurement of the health needs of local populations. Health Place. 2002;8(1):47–60. doi:10.1016/S13538292(01)000351.
 10.
Soljak M, Samarasundera E, Indulkar T, Walford H, Majeed A. Variations in cardiovascular disease underdiagnosis in England: national crosssectional spatial analysis. BMC Cardiovasc Disord. 2011;11(1):12. doi:10.1186/147122611112.
 11.
Marshall RJ. A review of methods for the statistical analysis of spatial patterns of disease. J R Stat Soc Ser A Stat Soc. 1991;154(3):421–41.
 12.
Rao JN, Molina I. Small area estimation. 2nd ed. Hoboken: Wiley; 2015.
 13.
Cohen G, Forbes J, Garraway M. Interpreting self reported limiting long term illness. BMJ. 1995;311(7007):722–4. doi:10.1136/bmj.311.7007.722.
 14.
Jordan K, Ong BN, Croft P. Researching limiting longterm illness. Soc Sci Med. 2000;50(3):397–405. doi:10.1016/S02779536(99)00297X.
 15.
Manor O, Matthews S, Power C. Selfrated health and limiting longstanding illness: interrelationships with morbidity in early adulthood. Int J Epidemiol. 2001;30(3):600–7. doi:10.1093/ije/30.3.600.
 16.
Senior ML. Area variations in selfperceived limiting long term illness in Britain, 1991: Is the Welsh experience exceptional? Reg Stud. 1998;32(3):265–80. doi:10.1080/713693455.
 17.
Borooah VK. Occupational class and the probability of longterm limiting illness. Soc Sci Med. 1999;49:253–66. doi:10.1016/S02779536(99)00101X.
 18.
Macintyre S, Der G, Norrie J. Are there socioeconomic differences in responses to a commonly used self report measure of chronic illness? Int J Epidemiol. 2005;34(6):1284–90. doi:10.1093/ije/dyi200.
 19.
Sutton M, CarrHill R, Gravelle H, Rice N. Do measures of selfreported morbidity bias the estimation of the determinants of health care utilisation? Soc Sci Med. 1999;49(7):867–78. doi:10.1016/S02779536(99)001690.
 20.
Congdon P. A life table approach to small area health need profiling. Stat Model. 2002;2(1):63–88.
 21.
Shouls S, Congdon P, Curtis S. Modelling inequality in reported long term illness in the UK: combining individual and area characteristics. J Epidemiol Commun Health. 1996;50(3):366–76. doi:10.1136/jech.50.3.366.
 22.
Bentham G, Eimermann J, Haynes R, Lovett A, Brainard J. Limiting longterm illness and its associations with mortality and indicators of social deprivation. J Epidemiol Commun Health. 1995;49(Suppl 2):57–64. doi:10.1136/jech.49.Suppl_2.S57.
 23.
Martin S, Sheldon TA, Smith P. Interpreting the new illness question in the UK census for health research on small areas. J Epidemiol Commun Health. 1995;49(6):634–41. doi:10.1136/jech.49.6.634.
 24.
Shouls S, Congdon P, Curtis S. Geographic variation in illness and mortality: the development of a relevant area typology for SAR districts. Health Place. 1996;2(3):139–55. doi:10.1016/13538292(96)000020.
 25.
Congdon P. The impact of area context on long term illness and premature mortality: an illustration of multilevel analysis. Reg Stud. 1995;29(February 2015):327–44. doi:10.1080/00343409512331349003.
 26.
Congdon P. Estimating diabetes prevalence by small area in England. J Public Health. 2006;28(1):71–81. doi:10.1093/pubmed/fdi068.
 27.
Stafford M, Becares L, Nazroo J. Objective and perceived ethnic density and health: findings from a United Kingdom general population survey. Am J Epidemiol. 2009;170(4):484–93. doi:10.1093/aje/kwp160.
 28.
Wakefield JC, Kelsall JE, Morris SE. Clustering, cluster detection, and spatial variation in risk. In: Elliott P, Wakefield JC, Best NG, Briggs DJ, editors. Spatial epidemiology: methods and applications; 2000. p. 128–52.
 29.
Jones K, Duncan C. Individuals and their ecologies: analysing the geography of chronic illness within a multilevel modelling framework. Health Place. 1995;1(1):27–40. doi:10.1016/13538292(95)000046.
 30.
Boyle P, Norman P, Rees P. Does migration exaggerate the relationship between deprivation and limiting longterm illness? A Scottish analysis. Soc Sci Med. 2002;55(1):21–31. doi:10.1016/S02779536(01)002179.
 31.
Norman P, Boyle P, Rees P. Selective migration, health and deprivation: a longitudinal analysis. Soc Sci Med. 2005;60(12):2755–71. doi:10.1016/j.socscimed.2004.11.008.
 32.
Smith SJ, Easterlow D. The strange geography of health inequalities. Trans Inst Br Geogr. 2005;30(2):173–90. doi:10.1111/j.14755661.2005.00159.x.
 33.
Subramanian SV, Jones K, Duncan C. Multilevel methods for public health research. In: Neighborhoods and health. Oxford: Oxford University Press. 2003. p. 65–111. doi:10.1093/acprof:oso/9780195138382.003.0004.
 34.
Pratesi M, Salvati N. Small area estimation: the EBLUP estimator based on spatially correlated random area effects. Stat Methods Appl. 2008;17(1):113–41. doi:10.1007/s1026000700619.
 35.
Adams J, White M. Removing the health domain from the index of multiple deprivation 2004effect on measured inequalities in census measure of health. J Public Health. 2006;28(4):379–83. doi:10.1093/pubmed/fdl061.
 36.
Tobler WR. A computer movie simulating urban growth in the detroit region. Econ Geogr. 1970;46:234–40. doi:10.2307/143141.
 37.
Bivand RS, Pebesma E, GómezRubio V. Modelling areal data. In: Applied spatial data analysis with R. New York: Springer. 2013. p. 263–318. doi:10.1007/9781461476184_9.
 38.
Lorant V, Thomas I, Deliège D, Tonglet R. Deprivation and mortality: the implications of spatial autocorrelation for health resources allocation. Soc Sci Med. 2001;53(12):1711–9. doi:10.1016/S02779536(00)004561.
 39.
Office for National Statistics. 2011 census: aggregate (England and Wales) bulk data download (release 3). Nomis: Office for National Statistics & University of Durham; 2013.
 40.
Office for National Statistics. 2011 census: methods and quality report: 2011 census quality survey. 2014. www.ons.gov.uk/ons/guidemethod/census/2011/censusdata/2011censususerguide/qualityandmethods/quality/qualitymeasures/assessingaccuracyofanswers/2011censusqualitysurveyreport.pdf. Accessed 3 Sept 2015.
 41.
Office for National Statistics. 2011 census: local authority response, return and coverage rates. Published 11 December 2012. http://www.ons.gov.uk/ons/guidemethod/census/2011/censusdata/2011censusdata/2011firstrelease/firstreleasequalityassuranceandmethodologypapers/censusresponserates.xls. Accessed 10 Feb 2014.
 42.
Office for National Statistics. 2011 census: methods and quality report: the 2011 census coverage assessment and adjustment process 2012. http://www.ons.gov.uk/ons/guidemethod/census/2011/censusdata/2011censusdata/2011firstrelease/firstreleasequalityassuranceandmethodologypapers/coverageassessmentandadjustmentprocess.pdf. Accessed 30 Nov 2015.
 43.
UK Department of Health. Improving outcomes and supporting transparency.—part 2: summary technical specifications of public health indicators. Updated November 2013. London: HMSO; 1975. http://www.gov.uk/government/publications/healthyliveshealthypeopleimprovingoutcomesandsupportingtransparency. Accessed 9 May 2014.
 44.
Office for National Statistics. Inequality in disabilityfree life expectancy by area deprivation: England, 2003–2006 and 2007–2010. Released 25 July 2013 2013. http://www.ons.gov.uk/ons/rel/disabilityandhealthmeasurement/subnationalhealthexpectancies/inequalityindisabilityfreelifeexpectancybyareadeprivationengland200306and200710/index.html. Accessed 30 Sept 2013.
 45.
Office for National Statistics. Disabilityfree life expectancy by upper tier local authority: England 2008–2010. Released 1 May 2014 2014. http://www.ons.gov.uk/ons/rel/disabilityandhealthmeasurement/subnationalhealthexpectancies/disabilityfreelifeexpectancysubnationalestimatesforengland200810/index.html. Accessed 1 May 2014.
 46.
Office for Population Censuses and Surveys Social Survey Division. The General Household Survey 1972. An interdepartmental survey sponsored by the Central Statistical Office. London: HMSO; 1975.
 47.
Agresti A. Categorical data analysis. New York: Wiley; 2002.
 48.
Gart JJ, Zweifel JR. On the bias of various estimators of the logit and its variance with application to quantal bioassay. Biometrika. 1967;181–187. doi:10.2307/2333861.
 49.
R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014. R Foundation for Statistical Computing. http://www.Rproject.org/.
 50.
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixedeffects models using lme4. J Stat Softw. 2015;67(1):1–48. doi:10.18637/jss.v067.i01.
 51.
Longford NT. Sample size calculation for smallarea estimation. Surv Methodol. 2006;32(1):87–96.
 52.
Prasad NGN, Rao JNK. The estimation of the mean squared error of smallarea estimators. J Am Stat Assoc. 1990;85(409):163. doi:10.2307/2289539.
 53.
Jiang J, Lahiri P. Empirical best prediction for small area inference with binary data. Ann Inst Stat Math. 2001;53(2):217–43. doi:10.1023/A:1012410420337.
 54.
GonzálezManteiga W, Lombardía MJ, Molina I, Morales D, Santamaría L. Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Comput Stat Data Anal. 2007;51(5):2720–33. doi:10.1016/j.csda.2006.01.012.
 55.
Cliff AD, Ord JK. Spatial processes: models and applications. London: Pion; 1981. p. 63–5.
 56.
Anselin L. Local indicators of spatial association—LISA. Geogr Anal. 1995;27(2):93–115. doi:10.1111/j.15384632.1995.tb00338.x.
 57.
Anselin L. The Moran scatterplot as an ESDA tool to assess instability in spatial association. In: Fischer M, Scholten HJ, Unwin DJ, editors. Spatial analytical perspectives on GIS. London: Taylor & francis; 1996. p. 111–25.
 58.
Bivand R, Hauke J, Kossowski T. Computing the Jacobian in Gaussian spatial autoregressive models: an illustrated comparison of available methods. Geogr Anal. 2013;45(2):150–79. doi:10.1111/gean.12008.
 59.
Bivand R, Piras G. Comparing implementations of estimation methods for spatial econometrics. J Stat Softw. 2015;63(18):1–36. doi:10.18637/jss.v063.i18.
 60.
Office for National Statistics. 2011 census: digitised boundary data (England and Wales). UK Data Service Census Support 2011. http://census.ukdataservice.ac.uk/getdata/boundarydata.aspx. Accessed 1 July 2013.
 61.
Banerjee S, Gelfand AE, Carlin BP. Hierarchical modeling and analysis for spatial data. Boca Raton: Chapman & Hall/CRC; 2004.
 62.
Office for National Statistics. Internal migration by local authorities in England and Wales—Research series, years ending June 2009 to June 2011.—Reference Tables.—14 November 2013. http://www.ons.gov.uk/ons/rel/migration1/internalmigrationbylocalauthoritiesinenglandandwales/researchseriesyearsendingjune2009tojune2011/index.html. Accessed 15 Nov 2013.
 63.
Office for National Statistics. 2001 National Statistics area classification for local authority districts in England 2010. http://www.ons.gov.uk/ons/guidemethod/geography/products/areaclassifications/nsareaclassifications/index/datasets/localauthorities/ladatafile.xls. Accessed 12 Dec 2013.
 64.
Pickett KE, Wilkinson RG. People like us: ethnic group density effects on health. Ethn Health. 2008;13(4):321–34. doi:10.1080/13557850701882928.
 65.
Jordan K, Ong BN, Croft P. Previous consultation and self reported health status as predictors of future demand for primary care. J Epidemiol Commun Health. 2003;57:109–13. doi:10.1136/jech.57.2.109.
 66.
Marshall A. Developing a methodology for the local estimation and projection of limiting long term illness and disability. A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Humanities. EtHOS ID: uk.bl.ethos.511749, 2009.
 67.
Office for National Statistics. Mortality statistics: deaths registered in England and Wales by area of usual residence, 2011. http://www.ons.gov.uk/ons/rel/vsob1/deathsregisteredareausualresidence/2011/index.html. Accessed 16 June 2013.
 68.
Public Health England. Emergency hospital admissions (all causes), indirectly age standardised ratio, all ages, persons. April 2008 to March 2013 (inclusive) 2014. http://www.localhealth.org.uk/Spreadsheets/Emergency%20hospital%20admissions%20for%20all%20causes_July2014.xls. Accessed 9 Aug 2016.
 69.
Public Health England. Elective hospital admissions (all causes), indirectly age standardised ratio, all ages, persons. April 2008 to March 2013 (inclusive) 2014. http://www.localhealth.org.uk/Spreadsheets/Elective%20hospital%20admissions%20for%20all%20causes_July2014.xls. Accessed 9 Aug 2016.
 70.
Lloyd CD. Assessing the spatial structure of population variables in England and Wales. Trans Inst Br Geogr. 2015;40(1):28–43. doi:10.1111/tran.12061.
 71.
VallejoTorres L, Morris S, CarrHill R, Dixon P, Law M, Rice N, Sutton M. Can regional resource shares be based only on prevalence data? An empirical investigation of the proportionality assumption. Soc Sci Med. 2009;69(11):1634–42. doi:10.1016/j.socscimed.2009.09.020.
 72.
Finney N, Lymperopoulou K, Kapoor N, Marshall A, Sabater A, Simpson L. Local ethnic inequalities: ethnic differences in education, employment, health and housing in districts of England and Wales, 2001–2011. Technical report, Runnymede, London 2014. http://www.runnymedetrust.org/uploads/Inequalities%20reportfinal%20v2.pdf. Accessed 9 Aug 2016.
 73.
Nazroo JY. Ethnic inequalities in health: addressing a significant gap in current evidence and policy. In: “If You Could Change One Thing...”: Nine Local Actions to Reduce Health Inequalities. London: British Academy; 2014. p. 92–101.
 74.
Scarborough P, Allender S, Rayner M, Goldacre M. Validation of modelbased estimates (synthetic estimates) of the prevalence of risk factors for coronary heart disease for wards in England. Health Place. 2009;15(2):596–605. doi:10.1016/j.healthplace.2008.10.003.
 75.
Datta GS, Hall P, Mandal A. Model selection by testing for the presence of smallarea effects, and application to arealevel data. J Am Stat Assoc. 2011;106(493):362–74. doi:10.1198/jasa.2011.tm10036.
 76.
Dale MRT, Fortin MJ. Spatial regressions. In: Spatial analysis a guide for ecologists, 2nd edn. Cambridge: Cambridge University Press; 2014. p. 227–39
 77.
Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B. 1974;36(2):192–236.
 78.
Borcard D, Legendre P. Allscale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol Model. 2002;153(1–2):51–68. doi:10.1016/S03043800(01)005014.
 79.
Legendre P, Fortin MJ. Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Mol Ecol Resour. 2010;10(5):831–44. doi:10.1111/j.17550998.2010.02866.x.
 80.
Voutilainen A, Sherwood PR. Explaining statelevel differences in brain cancer mortality in the United States by population ancestries: a spatial approach. J Epidemiol Res. 2015. doi:10.5430/jer.v1n1p12.
 81.
Darlington F, Norman P, Ballas D, Exeter DJ. Exploring ethnic inequalities in health: evidence from the Health Survey for England, 1998–2011. Diversity & Equality in Health and Care; 2015.
 82.
Molina I, Rao JNK, Datta GS. Small area estimation under a Fay–Herriot model with preliminary testing for the presence of random area effects. Surv Methodol. 2015;41(1):1–19.
 83.
Lee D, Mitchell R. Locally adaptive spatial smoothing using conditional autoregressive models. J R Stat Soc Ser C Appl Stat. 2013;62(4):593–608. doi:10.1111/rssc.12009.
 84.
Office for National Statistics. The 2021 Census initial view on content for England and Wales. You said: a summary of the results. November 2015. Technical report, 2015. http://consultations.ons.gov.uk/census/2021censustopicsconsultation/user_uploads/yousaid_final_report_19.11.15.pdf. Accessed 9 Aug 2016.
 85.
McCluskey S, Topping A. Increasing response rates to lifestyle surveys: a review of methodology and ‘good practice’; 2009. http://eprints.hud.ac.uk/6878/.
 86.
Mackenbach JP. Widening socioeconomic inequalities in mortality in six Western European countries. Int J Epidemiol. 2003;32(5):830–7. doi:10.1093/ije/dyg209.
 87.
Haut Conseil de la Santé Publique: Les Inégalités Sociales de Santé : Sortir de la Fatalité, 2009. http://www.hcsp.fr/explore.cgi/avisrapportsdomaine?clefr=113. Accessed 7 Dec 2015.
 88.
Bleich SN, Jarlenski MP, Bell CN, LaVeist TA. Health inequalities: trends, progress, and policy. Annu Rev Public Health. 2012;33(1):7–40. doi:10.1146/annurevpublhealth031811124658.
 89.
Commission on Social Determinants of Health. Closing the gap in a generation: health equity through action on the social determinants of health. Final Report of the Commission on Social Determinants of Health, 2008. http://www.who.int/social_determinants/thecommission/finalreport/en/. Accessed 7 Dec 2015.
 90.
Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J Clin Epidemiol. 2004;57(11):1138–46. doi:10.1016/j.jclinepi.2004.04.003.
 91.
Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79(3):340–9. doi:10.2105/AJPH.79.3.340.
 92.
Twigg L, Moon G, Jones K. Predicting smallarea healthrelated behaviour: a comparison of smoking and drinking indicators. Soc Sci Med. 2000;50(7–8):1109–20. doi:10.1016/S02779536(99)003597.
 93.
Zandbergen PA. Ensuring confidentiality of geocoded health data: assessing geographic masking strategies for individuallevel data. Adv Med. 2014;2014:1–14. doi:10.1155/2014/567049.
Authors' contributions
PDM designed and carried out the study. GM helped draft and approved the manuscript. Both authors read and approved the final manuscript.
Acknowledgements
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
The census data that support the findings of this study are available from Nomis, http://www.nomisweb.co.uk/census/2011. Census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland.
The hospital data that support the findings of this study are available from Public Health England Local Health, http://localhealth.org.uk/.
Ethics approval and consent to participate
Approval of the Faculty of Social, Human and Mathematical Sciences Ethics Committee was received on 24 September 2013.
Funding
This work was supported by an Economic and Social Research Council Advanced Quantitative Methods doctoral studentship [Grant Reference Number 1223155].
Author information
Additional files
12942_2016_57_MOESM1_ESM.r
12942_2016_57_MOESM2_ESM.docx
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Spatial autocorrelation
 Spatial dependency
 Spatial interaction
 Spatial weights
 Neighbourhood matrices
 Disease mapping
 Chronic morbidity
 Limiting longstanding illness