This work examined the validity of a potentially important and increasingly used ‘extensive secondary’ dataset in the UK. As has been noted, despite general epidemiological concern with regards to measurement accuracy  and the determination of exposure ‘truth’ , surprisingly little is known about the validity of commonly used secondary data sources in the field. This study assessed the accuracy of POI data (at least as compared to previously validated local council records) for the first time in the published literature. Although the results of this study are therefore specific to POI data, as compared with local council records in Cambridgeshire, UK, the importance of considering the validity of secondary data in these ways and across pertinent divisions remains important across all secondary datasets; this study is novel in this respect.
In terms of concordance between the datasets, the POI data contained 524 fewer gross records than were present in the council data, with a percentage agreement of 49.9%, translating into an overall PPV of 74.9% and sensitivity of 59.9% (‘moderate’). These results are largely in line with previous studies examining the accuracy of other secondary food environment data [12–15, 18–20], the caveat being that this study did not use a ground truthed dataset as a gold standard, and instead used a reliable secondary reference dataset (demonstrated to have a PPV of 91.5% in Newcastle, UK ) to increase the scale of the investigation.
Differentiation by type of food outlet revealed PPVs between 57.9% and 82.6%, with sensitivities between 37.8% (‘fair’) and 77.2% (‘good’). These assessments by food outlet type are roughly in line with those demonstrated in the literature [12, 19], but rather below those shown for some commercial US datasets . As these statistics were largely significantly sensitive to food outlet type, this research highlights the importance of considering the accuracy of secondary data for specific types of food outlet, as has been noted elsewhere . Although we find the lowest levels of gross completeness for cafés/coffee shops (39%), in terms of the number of missing records in POI data, convenience store records are especially incomplete with regards to percentage agreement, PPVs and sensitivity. These small grocery shops are commonly cited as being ‘obesogenic’ [27, 38, 39], being less likely than larger supermarkets to sell ‘healthful’ foods . Given this potential gap in the POI data, this might be an area to focus on if future research is considering supplementing POI data with either council records or field work. It is of note that POI appears to represent a particularly robust source of data on restaurant locations.
Importantly, PPVs across socio-economic and urban/rural divides were similar, both to each other, and to the statistic for all outlets. Such similarities have been demonstrated elsewhere [14, 18]. For sensitivity and percentage agreement, there were exceptions, including significantly better estimates of both in some more deprived quintiles, although no evidence of a trend existed, and in urban areas. This said, sensitivies across urban/rural and SES divides mostly remained ‘moderate’ and as such aligned with the overall sensitivity description. Whilst the data should still be seen as ‘imperfect’ , some had suggested that substantial differences in food outlet representation across SES and urban/rural divides such as those tested here might prevail [14, 22], and whilst this hypothesis should be further tested in validation studies of other datasets, we do not believe this was the case here.
The utility of POI data may be research specific, however, if selected as a source of food outlet location data, we suggest they should be used with confidence particularly with respect to data completeness over socio-economic divides, in urban areas, and where research focuses on restaurant, supermarket or takeaway locations.
Strengths of this study include the fair comparison of contemporaneous datasets, the application of a 6 category food outlet classification scheme whose outlet types should relate directly to future deductive research, and its large geographical scale, which enabled an assessment of over 2000 food outlets in each dataset. In particular, using established statistics (percentage agreement, PPVs and sensitivies) across urban/rural and socio-economic divides allowed an assessment of the likelihood of systematic geographical differences in completeness. To our knowledge, this is the first time that such an appraisal has been made in the published literature on a large scale.
There were several key limitations to this study. In order to enable the large study area, field work was not conducted, choosing instead to use local council data as our ‘gold (reference) standard’. Local council data have been shown accurate in several other regions of the UK, however they are unlikely to be complete, resulting in a potential lack of comparability with previous studies that can relate directly to the food environment reality. Despite this limitation, the strength of results found here suggest that if council data are indeed less complete than we might hope, or are systematically incomplete (for example, across socio-economic divides) they are at least aligned in these respects with POI records. In order to maximise heterogeneity in socio-economic status throughout the study area, quintiles of SES were calculated relative to the study area only. Increased sensitivity in detecting SES differences between LSOAs was useful for these analyses, however, our findings may not be applicable to the most deprived locales, which are substantially under-represented throughout Cambridgeshire (IMD scores are positively skewed towards being lower (less deprived); mean IMD for Cambridgeshire=15.51 (SD=11.44), range of possible IMD scores for England as a whole 0.53-87.80). This potential limitation may lead to a lesser degree of generalisability outside this study area, however it does not compromise the accuracy of these results. To facilitate a fair comparison of the datasets, we attempted to obtain as contemporaneous information as possible. We asked OS and local councils for current data in January 2012 to facilitate this, however, it is possible that either dataset may not reflect the food environment at precisely the same time. Whilst some exclusions in the datasets were made based on food not sold directly to the public (food producers, for example), exclusions of market traders or mobile food stands were made predominantly because addresses were for the traders’ home addresses and not the retail sites themselves. These types of food retailers are likely important sources of food [14, 22], potentially with a socio-economic gradient of use [41, 42], and should be considered where possible in future validation work.
In terms of the POI dataset itself, the data were not without duplicates that needed to be found and removed (n=105). The classification system supplied was too general to be of real use in most health research (for details see, http://www.ordnancesurvey.co.uk/oswebsite/docs/product-schemas/points-of-interest-classifications-scheme.pdf) so a project specific classification scheme such as the one used here would almost certainly be required. POI contains records beyond simply the foodscape, making it difficult to discern whether listed establishments sold food or not. In council datasets, outlets are listed precisely because they sell food. This breadth may lead to the omission of important sources of food within the environment, for example from pharmacies, such as Boots the Chemist, a national chain that often but not always sells food items. Investigative work would be required when using POI data to determine whether or not each of these individual stores sells food.