Although the patient data used in this study does not provide a complete profile of the diabetes situation in Los Angeles County, the study area provides an invaluable insight into a geography dominated by those carrying the highest disease burden, most of whom would be expected to attend the Los Angeles County-USC Medical Center. Although there are likely to be pockets of diabetes sufferers living within this area who seek treatment elsewhere - especially those from more affluent situations, these should not affect the research question of the paper which is focused on the most socially vulnerable population suffering from diabetes. Indeed, an even more stringent geographic reduction of these data is used whereby only ED visits are analyzed, in other words investigating the most vulnerable and those causing the greatest "costs" to the health care system.
Data
The Los Angeles County-USC Medical Center has an electronic medical record system which archives the physician designated International Classification of Disease code (ICD) for each patient encounter. It is assumed that the ICD code is accurately recorded in the emergency room data subset analyzed in this paper due to the experience of the medical staff working there, and the shared understanding that hospital record keeping should maintain a high quality [27]. In total 30158 records where a patient was labelled with diabetes were attained from the Los Angeles County-USC Medical Center. Patients in the ED are evaluated with a limited history and physical examination. They are then labelled with an ICD code for diabetes if they have a pre-existing history of diabetes or laboratory confirmation (eg. Random blood sugar of 200 or greater). Diabetes patients who do not provide a verbal history of diabetes to the ED staff or don't present with a condition that warrants laboratory investigation may not be appropriately labelled. Our study relies on the accuracy of this staff-dependant ICD coding for diabetes. From this total 20257 records were confidently address matched, with non matches falling into the usual categories of spelling mistakes, inappropriate field entries, and non-spatially specific "homeless" records. As the purpose of this paper was to establish spatial patterns of diabetes characteristics, and not establish a complete record of the disease, only those addresses matched with an extremely high degree of confidence were retained. For example, imputation approaches were not used (for example using zip code centroids) to compensate for spatial entry uncertainties [28]. As there is no expected geographic bias in the non-matched addresses (apart from the category of "homeless"), and as this dataset would never capture the entire diabetes situation for the area, the successfully matched addresses should be considered as a suitable sample for the proposed analysis.
These records contain basic social characteristics, including age, sex, and race. In addition, diagnoses and place of contact are also recorded. As the data set contains all diabetes related visits during 2009, many individuals appear as multiple records. Therefore, using the machine run identifier (the database method employed by the hospital to identify a person's record), a subset of 8875 unique individuals were extracted. From these, a further subset was selected and kept for analysis of those patients who had used "emergency department visit for the evaluation and management" for their diabetes related problem. This subset of 3522 records would be used as the base layer for the subsequent spatial investigations of diabetes.
When using a secondary dataset of this type, for any analysis involving the calculation of disease rates a suitable "denominator" has to be determined. As one purpose of this paper is to show how spatial patterns of associated health problems occur within one vulnerable cohort, the total number of unique individuals being studied (3522 patients) is used as the denominator. The benefit of this approach is that the chance of a bias propagating because of the dataset being analyzed is minimized. If certain neighbourhoods generate a disproportionate number of ED treatment seeking individuals, then these will also likely present as hotspots for the related conditions being analyzed. By using ED attendees as the denominator this effect is controlled for and the analysis reveals spatial patterns from within this cohort. Therefore the hotspots that are identified are only of the population being studied (ie those who have diabetes and receive treatment through the ED). Using this subset as a denominator surface, a series of smoothed disease rate surfaces are calculated using DMAP spatial filtering software [29] for the following diagnostic categories (ICD listed):
1. Uncontrolled Blood sugars including both type 1 diabetes (250.01) and type 2 diabetes (250.02) in this ED population.
2. End organ damage related to uncontrolled diabetes in the following:
250.4 = kidney issues with diabetes
250.6 = neurologic issues related with diabetes
250.8 and 250.9 being combined as a category for non-specified "other" complications of diabetes
For a patient to be considered a numerator for any of the specified ICD analyses, he/she had to have one of the previously listed codes appearing in the primary (first) diagnostic column - suggesting this symptom was the main cause for the ED visit. Despite the differences in the underlying mechanism of disease between type 1 and type 2 diabetes, this underserved, predominantly minority population have the same social barriers that limit their adherence to medical regimens and therefore lead them to the ED for primary or emergent care. In addition, we chose to evaluate complications that occur in both types of diabetes, eg. hyperglycemia, renal, and neurologic disease.
Although the main objective of this paper was to show the potential for using ED data to reveal neighbourhood patterns in diabetes, a secondary aspect of this research is to combine both an academic investigation with a "real world" practical application. A collaboration has been established between the Department of American Studies and Ethnicity at the University of Southern California and an outpatient Diabetes and Metabolism Clinic (DMC) staffed by an endocrinologist and two nurses associated with the Los Angeles County-USC medical center. This clinic implements a Diabetes Management Program with the aim of the global risk reduction of metabolic parameters (blood sugar, blood pressure, and lipids) utilizing management guidelines from the American Diabetes Association and American Heart Association in a culturally appropriate format. Diabetic patients are referred from their primary care physicians when their glycemic control is difficult to manage. This population is predominantly living below the federally defined poverty level and has no established health care coverage beyond the free access provided to them from the Los Angeles County Healthcare System. Due to the close proximity of this clinic to the Los Angeles County-USC Medical Center ED, the DMC predominantly serves the same underserved community. A selection of patients (29 in total) attending this new clinic are overlaid onto the hotspot surfaces in order to provide an assessment of the diabetes situation surrounding their home. In other words, the actual residences of the patients are placed on the maps of diabetes related health problems. This approach of combining "new" patients with a background disease surface can also provide a mechanism for assessing the success of new clinic strategies by comparing changes in patient outcomes to their ecological situation. In addition, for the purposes of this paper, these patients also provide a set of sample locations drawn from the same population against which the background diabetes disease surface can be described.
Spatial Analysis
There are several spatial models which could have been applied to reveal neighbourhood scale "hotspot" patterns [30–34]. For this paper DMAP software [35] was chosen due to its established success of analyzing fine scale health information, including birth outcome and cancer data [29, 36]. The basic premise of DMAP is a grid is overlaid on the study area, a circular filter is placed over every grid node and a rate is calculated with the numerator being comprised of patient addresses for a certain ICD, and the denominator being all patient addresses diagnosed as having diabetes who made visits to the ED. As long as the filter radius is greater than half the distance between nodes, the calculated rate for each filter will involve points used in multiple rate calculations resulting in a smoothed rate surface that is not truncated by political boundaries. In general, the larger the radius of the filter, the more points are included in the rate calculation and the smoother (and less volatile) the final surface tends to be. The output from DMAP consists of a grid with a rate calculation attached to each node which can be imported into a GIS and mapped as either an interpolated surface, or as contours.
For this paper a geographic subset of Los Angeles County (30 km by 15 km) was selected as the study area based on the output of a density analysis of the total (20257) records from the Los Angeles County-USC Medical Center. This geographic extent can be seen in Figure 1. In order to compare scale effects that might occur through such a boundary selection, a secondary analysis was also performed for a larger geographic area. Both extents were covered by a grid with a resolution of 0.5 km, and rates were calculated for a filter extending to 1 km from each node. Multiple band widths for the filters were tried, and the one chosen (twice the distance between nodes) corresponds to similar appropriate ratios found by the authors in other applications of this technique. In order to reduce the error associated with the law of small numbers, only filters with at least 30 denominator values were recorded, all other nodes on the map being removed. The output of rates attached to each grid node was imported to Arc Map GIS, where interpolated surfaces for each ICD rate were calculated. These interpolated grids were also contoured so that they could easily be overlaid on the underlying geography to aid in interpretation and spatial comparisons.
A test of statistical significance was also applied to the rate surface using a Monte Carlo simulation within DMAP. Each denominator address was given the chance of becoming a numerator ICD based on the proportion of that condition found in the actual population under investigation. A simulated surface was calculated using these probabilities - in effect the same overall ICD rate was maintained for the study area, but the locations of the numerators changed. By repeating this simulation for 1000 runs, a comparison distribution is created. The actual rate for any one node can then be compared against this simulated distribution to determine how frequently the "real" rate exceeded the "simulated" rate. Therefore, if the actual disease rate was higher in any neighbourhood than in 950 of the 1000 simulations, we can be 95% confident of determining this location to be a hotspot.
The secondary aspect of this paper was to illustrate how the analytical insight can also be used to support near-real time diabetes control. To this end, patient addresses from a diabetes clinic were overlaid onto the DMAP output maps of rates and significance. This meant that each of the new clinic's patients were matched to their closest node, with all the attributes of that node (the statistical significance values for each ICD being investigated, and the separating distance between the address and the node) being attached to the patient's address.