A method for modelling GP practice level deprivation scores using GIS

Background A measure of general practice level socioeconomic deprivation can be used to explore the association between deprivation and other practice characteristics. An area-based categorisation is commonly chosen as the basis for such a deprivation measure. Ideally a practice population-weighted area-based deprivation score would be calculated using individual level spatially referenced data. However, these data are often unavailable. One approach is to link the practice postcode to an area-based deprivation score, but this method has limitations. This study aimed to develop a Geographical Information Systems (GIS) based model that could better predict a practice population-weighted deprivation score in the absence of patient level data than simple practice postcode linkage. Results We calculated predicted practice level Index of Multiple Deprivation (IMD) 2004 deprivation scores using two methods that did not require patient level data. Firstly we linked the practice postcode to an IMD 2004 score, and secondly we used a GIS model derived using data from Rotherham, UK. We compared our two sets of predicted scores to "gold standard" practice population-weighted scores for practices in Doncaster, Havering and Warrington. Overall, the practice postcode linkage method overestimated "gold standard" IMD scores by 2.54 points (95% CI 0.94, 4.14), whereas our modelling method showed no such bias (mean difference 0.36, 95% CI -0.30, 1.02). The postcode-linked method systematically underestimated the gold standard score in less deprived areas, and overestimated it in more deprived areas. Our modelling method showed a small underestimation in scores at higher levels of deprivation in Havering, but showed no bias in Doncaster or Warrington. The postcode-linked method showed more variability when predicting scores than did the GIS modelling method. Conclusion A GIS based model can be used to predict a practice population-weighted area-based deprivation measure in the absence of patient level data. Our modelled measure generally had better agreement with the population-weighted measure than did a postcode-linked measure. Our model may also avoid an underestimation of IMD scores in less deprived areas, and overestimation of scores in more deprived areas, seen when using postcode linked scores. The proposed method may be of use to researchers who do not have access to patient level spatially referenced data.


Background
A measure of socioeconomic deprivation assigned to the patient population registered with a general practice can be used to explore the association between deprivation and other practice characteristics, such as disease prevalence or quality of care. This then allows questions of equity to be addressed, for example, how are resources distributed in relation to the need for them? There are two commonly used methods for calculating practice level deprivation found in the literature, both of which are based on small-area deprivation measures such as the Townsend index [1] or the Index of Multiple Deprivation [2]. The first method for calculating practice level deprivation takes spatially referenced patient level data and calculates a mean score, weighted for the proportion of the practice population living within each small area (see for example [3][4][5]). For the purposes of this paper we shall refer to this method as the "gold standard", although we recognise that it has limitations. The second method uses only the location of the practice building itself and links the practice to the score assigned to the small area in which it resides (see for example [6][7][8]).
The difficulty with the gold standard practice populationweighted method is the need for patient level geographical data. Although these data are used routinely within NHS Primary Care Organisations, they are not easily accessible to researchers working in other parts of the NHS, or in the academic sector [9]. The postcode-linked method is more straightforward, in that it does not require such data, but makes the assumption that the deprivation score associated with the small area in which the practice resides provides a valid proxy for the socioeconomic deprivation experienced by the practice population as a whole. Given that the majority of a practice's registered patient population live in areas surrounding the practice that are likely to have deprivation scores different to that of the area in which the practice is located, this assumption is questionable. We have previously shown than the postcode-linked method will tend to underestimate an association between deprivation and another practice level variable such as mortality [10].
An alternative approach, which we propose, is to model the population-weighted deprivation measure based on an assumed spatial distribution of patients around a "typical" practice, rather than the true spatial distribution for which the data are unavailable. This study aimed to develop such a model using spatially referenced patient level data from Rotherham, UK, and then to test the model's ability to predict practice level populationweighted IMD 2004 scores in three other UK districts: Doncaster, Warrington and the London borough of Havering.

Model construction using Rotherham Primary Care Trust data
In January 2006 there were 39 general practices that contracted with Rotherham Primary Care Trust, with a total registered population of 253,417. Of these registered patients, 246,574 lived within the broadly coterminous Rotherham Local Authority area. One of the 39 Rotherham practices was a small specialist practice providing care for asylum seekers and homeless people and was removed from this analysis. Twenty-five of the remaining 38 practices had a single main surgery building, nine had one branch surgery and four had two branch surgeries. The total number of practice sites was therefore 55 and the total number of Rotherham-resident patients registered with these practices was 245,107.
We constructed 0.1 mile (0.16 km) width concentric ring buffer zones around the practice site postcode centroid for each of the 55 practice sites using geographical information systems (GIS) methods [11]. For every practice we then calculated the proportion of their registered patients whose postcode centroid lay within each of the 0.1 mile concentric ring buffers. Patients are registered at a practice rather than at an individual surgery site, so we made the assumption that patients would attend the nearest site if a practice had one or more branch surgeries. We then calculated the mean of the practice population proportions living within each concentric ring buffer for the 55 practice sites.
Initial examination of the relationship between the proportion of the registered practice population registered per unit area and distance from the practice suggested that an exponential decay function would fit the data. This is in line with previous published literature (e.g. see [12] or [13]), and has a firm theoretical underpinning [14]. We therefore fitted a curve of the general form: i.e. the proportion of the registered population (p) per unit area (A) is proportional to the exponential decay of distance from the practice (e -d ). Since the areas of concentric ring buffers of equal width are proportional to their distance from the central point, (i.e. A α d) [14], equation 1 can be expressed as follows: Where s 1 and s 2 are scaling constants.
We used SigmaPlot 9.0 to fit a general function of this form to the Rotherham practice population proportion versus distance data, and so determine the scaling con-stants [15]. The resulting function was then taken to describe the proportion of the total registered practice population (p) at distance (d) from the "average" Rotherham practice site.

Using the model to predict deprivation scores in the absence of patient level data
We used the Rotherham derived practice population distance function to predict practice level Index of Multiple Deprivation (IMD) 2004 scores for practices for which we had no patient level data. We chose for our study the practices within the following three districts: Doncaster, Warrington and the London borough of Havering. These districts were selected because we had access to published or unpublished "gold standard" practice populationweighted IMD 2004 scores (i.e. scores calculated using patient level spatially referenced data) necessary to test our "modelled" scores. We considered the three districts separately in order to determine the performance of the model in areas with different social and spatial characteristics. Researchers may wish to calculate a deprivation measure for practices within a single district, and we wished to know whether there was likely to be a significant "district" effect. We modelled the practice population-weighted IMD 2004 deprivation score as follows. We first calculated the Euclidean distances between the practice building postcode centroid and the centroids of all the LSOAs within the PCT using the Spider Graph function in MapInfo 8.0 [11]. We then estimated the proportion of the practice population living in each LSOA by applying the Rotherham derived distance function. Finally, we calculated the mean of the LSOA IMD 2004 scores, weighted by the modelled practice population proportion multiplied by the LSOA population. LSOA populations were obtained from the 2001 Census via Casweb [18].
Where a practice had a branch surgery in addition to the main surgery we calculated an IMD 2004 score for each practice site using the same method above. Since patients in the UK are registered with a practice, rather than a surgery site there is no way of obtaining data for the distribution of patients between surgery sites. Similar studies to ours that have examined health care accessibility have assigned equal waiting to main and branch surgeries [19,20]. We therefore made the assumption that equal numbers of patients would attend each site, and the over-all practice deprivation score was taken as the simple average of the scores for the practice sites. We explored the performance of the model under a range of different main-branch surgery population weightings in a simple sensitivity analysis.

Calculating simple practice postcode-linked scores
Our model will only be useful if it provides significant benefit over simply linking the practice postcode to an IMD2004 score via the LSOA in which the practice resides. We calculated these scores as follows. For each of the practices in Doncaster, Havering and Warrington we used the All Fields Postcode Directory (now known as the National Statistics Postcode Directory) [21], to link the postcodes of the main surgery building and any branch surgeries to the LSOAs in which the postcode centroids were located, and hence to IMD 2004 scores. For practices with one or more branch surgeries we calculated the average of the IMD 2004 scores linked to the main and branch surgeries.

Measuring agreement between deprivation scores
We obtained practice population-weighted IMD 2004 score data for the three districts from published or unpublished sources. The data set for Doncaster was calculated as part of routine PCT work but is as yet unpublished; data for Warrington and Havering were obtained from electronically published sources [22,23]. Each data set consisted of the mean IMD 2004 score, weighted for the proportion of the registered practice population living in each LSOA. We refer to these data as the "gold standard" practice level deprivation scores.
We examined the agreement between the gold standard scores and the simple postcode-linked scores, and between the gold standard scores and our modelled population-weighted scores using Bland and Altman's method for measuring agreement [24]. In this method the differences in pairs of scores (i.e. in our case, predicted score -gold standard score) are plotted against the means of the pairs of scores (i.e. (predicted score + gold standard score)/2). We calculated and plotted the mean of the differences in scores for each dataset and the 95% "limits of agreement" (i.e. mean of the differences +/-1.96 × standard deviation (SD) of the differences). We also calculated 95% confidence intervals for each of the above measures. The resulting chart illustrates the overall agreement between the two measures as well as highlighting any systematic bias in the predicted values.
If predicted scores are in perfect agreement with the gold standard scores the differences between the pairs will all be zero, and the plot will show a series of points along a horizontal straight line. The degree to which the differences deviate from zero gives an indication of the level of disagreement between the predicted and the gold stand-ard scores. The mean of the differences between predicted and gold standard scores tells us whether, on average, the method for predicting scores over-or under-estimates the gold standard population weighted score. The approximate normality of the data allows us to test the hypothesis of no bias using the relatively robust paired t-test [25].
Examining the relationship between the individual differences in pairs of scores and their means is also informative. If a positive relationship exists between the differences in scores and their means, this suggests that deprivation is being overestimated in more deprived areas relative to less deprived areas (likewise the opposite holds if a negative relationship exists). The presence or absence of a linear relationship can be examined by calculating the correlation coefficient [25].
If two different methods for predicting scores are both unbiased then the method that leads to the lesser variability will be the more useful. We tested the hypothesis that the two methods (postcode linking and GIS modelling) had the same variability in predicting the "gold standard" scores using the Fligner-Killeen test for homogeneity of variances within R 2.5.0 [26]. This non-parametric method was chosen since the parametric F-test for homogeneity of variance is too sensitive to the deviations from normality seen in our data.
We compared the two different methods in an overall analysis combining practices from the three districts, and for the three districts separately. The district level analyses allowed us to determine the performance of the model for a relatively small number of practices within a defined area. We did this because researchers may wish to calculate a deprivation measure for practices within a single district, and we wished to know whether there was likely to be a significant "district" effect.

Model construction
The 90% effective catchment area (the area that encompasses 90% of the practice population) for the 55 practice sites ranged from 0.8 miles (1.28 km) to 3.3 miles (5.28 km) with a mean of 1.7 miles (2.72 km). (Figure 1) shows the distribution of the mean population proportions by distance from all 55 Rotherham practice sites used to construct the model, along with the fitted exponential decay function curve.
The scaling parameters that allowed the best fit of the exponential decay model to the Rotherham data were s 1 = 80.6822 and s 2 = 2.9844. The adjusted R squared for goodness of fit was 0.98.

Measurement of agreement between deprivation scores
Population-weighted deprivation scores derived from patient level spatially referenced data were available for 46 practices in Doncaster [unpublished source], 53 practices in Havering [22] and 26 practices in Warrington [23]. We refer to these data as the "gold standard" populationweighted deprivation scores.
Scatter plots of the gold standard population-weighted scores against our simple practice postcode-linked measure, and of the gold standard scores against our modelled population-weighted scores are shown for each district in (Figures 2, 3 and 4). On each plot the dashed diagonal line represents the "line of perfect agreement", along which points would lie if the methods produced identical results.
The plots of differences in scores against the score means (as per the Bland and Altman method [24]) are shown in (figures 5, 6, 7 and 8). The mean of the differences is shown, along with the 95% "limits of agreement" (see also There was a systematic tendency for the postcode-linked method to underestimate the gold standard score in less deprived areas, and overestimate it in more deprived areas. This is apparent in the positive linear relationships shown in (figures 5, 6, 7 and 8) (see table 2 for correlation coefficients). Our modelling method showed no such linear relationship when predicting scores in Doncaster and Warrington (figures 5 and 7), or in the combined analysis ( figure 8). In Havering there was a moderate negative cor-relation between the differences in scores and their mean, suggesting that our method underestimated scores at higher levels of deprivation. However, the magnitudes of the deviations in Havering were very small.
The differences between the postcode-linked score predictions and gold standard scores were greater than the differences between the modelled and gold standard scores as shown by the wider 95% agreement limit intervals on plots 5 to 8. For Doncaster and Havering, and for the combined analysis, this difference was significant (Fligner-Killeen test for homogeneity of variances p < 0.001, see Table  3).

Sensitivity analysis using different weightings for main and branch surgeries
The predicted scores in the above results were calculated using a simple equal weighted average for the scores predicted for the main and any branch surgeries. We exam-ined the performance of the model (for those practices with branches) under a range of alternative main-branch surgery weightings in a sensitivity analysis. As the weighting given to the main surgery increased from an equal share towards 100%, the mean difference between the predicted scores and the "gold standard" scores deviated away from zero, suggesting an increasingly biased prediction. In a secondary analysis comparing results for single site versus multi site practices across all three districts, the model performed no worse when predicting scores for practices with more than one site (mean difference between modelled scores and gold standard scores for single site practices: 0.25 IMD points, 95% CI -1.25 to 1.75; for multi-site practices: 0.39, 95% CI -0.35 to 1.14).

Main results
This study has shown how a fairly simple model of the distribution of registered patients around a general prac- The mean proportion of patients living within 0.1 mile concentric ring buffers by distance from practice (data from 55 Rother-ham practice sites) Figure 1 The mean proportion of patients living within 0.1 mile concentric ring buffers by distance from practice (data from 55 Rotherham practice sites). Fitted curve represents equation 2 in text (adj R-square = 0.98).

Distance at midpoint of 0.1m concentric ring buffer (miles) Mean proportion of population living within ring buffer area (as %)
tice can be used to predict a practice population-weighted area-based measure of socioeconomic deprivation. This method appeared superior to the common practice of linking the practice postcode alone to the deprivation score assigned to the area in which the practice is located. There were some differences in the ability of the model to accurately predict scores between the three districts. One reason for this may be the existence of groups of practices that, although they are in close geographical proximity, attract patient populations from quite different socioeconomic or ethnic backgrounds. This is certainly the case for Doncaster and will reduce the predictive accuracy of the model in an unpredictable manner. There may also be in some districts practices that draw their patients from a particular subgroup of the population, for example, students or asylum seekers. A deprivation score for such a practice would be difficult to predict using any method Scatter plot of predicted IMD score (modelled and postcode linked) versus gold standard score for Havering practices Figure 3 Scatter plot of predicted IMD score (modelled and postcode linked) versus gold standard score for Havering practices.  Scatter plot of predicted IMD score (modelled and postcode linked) versus gold standard score for Doncaster practices Figure 2 Scatter plot of predicted IMD score (modelled and postcode linked) versus gold standard score for Doncaster practices. that did not have access to data regarding the registered patient population.

Strengths
Our proposed modelling method may be particularly useful for researchers who need a measure of the socioeconomic deprivation experienced by a general practice population, but who do not have access to patient level spatially referenced data. The method was relatively straightforward to execute using basic GIS functions within a commonly used mapping package, MapInfo Professional 8.0 [11]. Although we obtained Census and postcode data using the academic sources Casweb [18] and Edina [21], equivalent data sets are available free within the NHS [27,28].

Limitations
The data used to construct the model were derived from 55 practice sites in Rotherham, and the model will therefore be most accurate when used to predict scores for practices whose geographical distribution of patients is similar to that found in Rotherham. The method makes a number of assumptions: firstly that registered patients will be equally distributed in all directions around a practice, which is unlikely to hold true in all cases, and especially not so at natural or administrative boundaries; secondly that the same spatial distribution of patients exists around urban and rural practices and around practices regardless of their size. Again, this is a simplification of a complex real-world situation. Thirdly we assumed that all patients Differences in IMD 2004 scores (predicted score -gold standard score) against their mean for Doncaster practices  Scatter plot of predicted IMD score (modelled and postcode linked) versus gold standard score for Warrington practices Figure 4 Scatter plot of predicted IMD score (modelled and postcode linked) versus gold standard score for Warrington practices.   lived within the PCT in which the practice was located. This will not always be the case, particularly at PCT boundaries. However, a PCT would have to make the same assumption in calculating a patient weighted score using individual level spatially referenced data. This is because PCTs routinely have access only to individual data for their resident population, rather than the whole population registered with their contracting GPs. Despite these limitations the model performed well at predicting scores in three other districts within the UK.
We assumed, where a practice had one or more branch surgeries, that equal numbers of patients attended each surgery site. This may or may not be true, and intuitively it may seem that more patients would attend a main practice than would attend a branch practice. The number of patients registered at a branch surgery is not routinely recorded outside of the practice, and we could find no literature from which to estimate an average distribution.
Perhaps surprisingly, we found in our sensitivity analysis that an assumption of an equal distribution of patients between each surgery site led to a less biased model prediction than did assuming that a greater proportion of patients attended the main site than any branch sites. Under the assumption that equal numbers of patients attended each surgery site we found that the model per-formed just as well when predicting deprivation scores for multi-site practices as it did for single site practices. It would be straightforward, however, for the model to be applied assuming a different weighting between main and branch surgeries if there were particular reasons (e.g. local knowledge of practice characteristics) to support this.
The Index of Multiple Deprivation 2004 used in this model is an area-based measure. It is derived from variables measured at the group level (for example the proportion of population in a household receiving the Income Support welfare benefit), and applies therefore to the group, rather than any single individual. An individual living within the group may experience a quite different level of deprivation. "Well off" people live in areas of high deprivation, and vice versa. Associations seen between area-based deprivation and other variables, such as mortality or disease prevalence, may or may not apply at the individual level, and assuming it does apply is known as the "ecological fallacy". Despite this limitation, area level associations are important whether they reflect individual level associations or not, and indeed some variables such as social cohesion, are meaningfully analysed only at a group level.
Differences in IMD 2004 scores (predicted score -gold standard score) against their mean for all practices  A second potential limitation of using the IMD 2004 when exploring associations between deprivation and practice level health related measures is the inclusion within IMD 2004 of a "Health Deprivation and Disability Domain". This could, in theory, lead to an inevitable correlation between the deprivation measure and the health outcome variable of interest, a problem known as "mathematical coupling". However, removing the health domain from IMD 2004 has been shown to make little practical difference [29]. An alternative course of action would be to use, say, the IMD 2004 income domain alone to construct a practice level measure of deprivation. Our model would be equally useful, since it is essentially a weighting method that can be applied to any variable linked to a small area.

Conclusion
A simple GIS based model, based on an assumed exponential decay distribution of patients around a general practice, can be used to predict a practice populationweighted area-based deprivation measure. Our modelled measure had better agreement with the populationweighted measure than did a postcode-linked measure alone. Our model may also avoid a systematic underestimation of IMD score in less deprived areas, and a systematic overestimation of scores in more deprived areas, that was seen when using postcode-linked scores. This method may therefore be of use to researchers who do not have access to patient level spatially referenced data.
Publish with Bio Med Central and every scientist can read your work free of charge