Exploring factors affecting individual GPS-based activity space and how researcher-defined food environments represent activity space, exposure and use of food outlets

Background Obesity remains one of the most challenging public health issues of our modern time. Despite the face validity of claims for influence, studies on the causes of obesity have reported the influence of the food environment to be inconsistent. This inconsistency has been attributed to the variability of measures used by researchers to represent the food environments—Researcher-Defined Food Environments (RDFE) like circular, street-network buffers, and others. This study (i.) determined an individual’s Activity Space (AS) (ii.) explored the accuracy of the RDFE in representing the AS, (iii.) investigated the accuracy of the RDFE in representing actual exposure, and (iv.) explored whether exposure to food outlet reflects the use of food outlets. Methods Data were collected between June and December 2018. A total of 65 participants collected Global Positioning System (GPS) data, kept receipt of all their food purchases, completed a questionnaire about their personal information and had their weight and height measured. A buffer was created around the GPS points and merged to form an AS (GPS-based AS). Results Statistical and geospatial analyses found that the AS size of participants working away from home was positively related to the Euclidean distance from home to workplace; the orientation (shape) of AS was also influenced by the direction of workplace from home and individual characteristics were not predictive of the size of AS. Consistent with some previous studies, all types and sizes of RDFE variably misrepresented individual exposure in the food environments. Importantly, the accuracy of the RDFE was significantly improved by including both the home and workplace domains. The study also found no correlation between exposure and use of food outlets. Conclusions Home and workplace are key activity nodes in modelling AS or food environments and the relationship between exposure and use is more complex than is currently suggested in both empirical and policy literature.

in the current obesity epidemic is the environment [39,45]. In the past five decades, the developed nations have experienced a change in their food environments characterised by increased access to energy-dense food [2,23,51]. Although the concept of environmental influence on obesity is appealing, the evidence to support it is still inconsistent [9,63].
The inconsistency in the link between the food environment and health outcomes could be attributed to not only individual-level variability in the dose-response relationship but also the variation of measures used to represent food environments [63]. The evidence on the dose-response relationship between exposure to food and dietary intake is limited. The dearth of knowledge in this area is partly contributed to by intricate and intertwined social, temporal and spatial aspects of the interaction between people and place, making the effective measure of exposure challenging [12].
A body of evidence in the food environment field is underpinned by research using fixed point-based anchors such as home, school or workplace and areabased anchors such as census tract, electoral wards, postcode, or other administrative units [9,63]. Due to the increased availability of secondary data containing individuals' geographic information like individuals' home and workplace, there has been an increase in studies examining environmental influences of health using fixed point-based anchors measures of physical environments in which different buffer types and sizes were created around fixed points (i.e. homes or workplaces) to represent individuals' food environments. In this study, spaces created by researchers around participants home, workplace and en route were referred to as Researcher-Defined Food Environments (RDFE). Cummins et al. [11] refer to this approach as a conventional view of the food environment, whereby places are geographically defined space with boundaries identified at a specific scale spaced by a physical distance (e.g. zip codes, census tract and postal codes) [11]. Despite being commonly used by researchers in food environment research, this approach has been critiqued for minimising the size of the environment and exposure [11,65].
Cummins et al. [11] suggest an alternative view of environment-a relational view. This view considers points of interest (places) as nodes in networks rather than discrete and autonomous bounded spatial units which are unstructured, unbounded and freely connected with human practice forming connection patterns [11,38]. Hudson [25] reiterates this view when he describes these nodes and networks as complex circuitry with multiple linkages and feedback loops. The advent of technologies like Global Positioning System (GPS) has provided effective means of capturing individual Activity Space (AS)-a relational view of the environment-which represents opportunities for the regular and daily experience of people in multiple settings. AS refers to "the area within which people move or travel in the course of their daily activities" [40], p.439). Likely, AS can stretch beyond the residential neighbourhood, incorporating a holistic range of exposures.
Although AS has been examined by various health researchers [13,37,43,47,52,56], Weerdenburg, van et al. 2019; [64], only a handful of studies have used it to explore the influence of food environment on individuals' health. Several studies have found that area of AS bigger than RDFE. Zenk et al. [65] for example found in their study that AS were bigger than home neighbourhoods and dietary behaviours had a statistically significant link to AS environmental characteristics. Likewise, Crawford et al. [10] found that the average area for participantdefined neighbourhoods (0.04 square miles) was smaller (2 miles) compared to the road network neighbourhoods (3 square miles) and AS (26 square miles). They also noted in their study that AS provided the greatest exposure than other measures. Food environment refers to all opportunities for someone to obtain food, including physical, socio-cultural, economic and policy factors at micro-level (local settings such as schools, homes, and workplaces) and macro-level (broader environments or sectors like education, health systems and food industry) [31,54].
Furthermore, it is suggested that individual's sociodemographic and socioeconomic characteristics (age, gender, educational attainment, occupation and income) and environmental factors (e.g. residential place and land use) characteristics could have an influence on the nature and characteristics of an individual's daily mobility [29]. A study by Widener et al. [61] for instance showed that individuals demographics, household food shopper status and city of residence had a significant association with different levels of exposure to various food outlets. Also, food shopping behaviours were statistically significantly associated with demographics, the activity spacebased food environment, self-reported health and city of residence. Despite these findings, there is a limited research on the influence of individual and environmental characteristics on an individual's AS overall.
When considering exposure, a study by Burgoine et al. [6] revealed that exposure to takeaway food outlets in the home, work, and commuting environments combined was linked to marginally higher consumption of takeaway food, greater body mass index, and greater odds of obesity. Similarly, a study by Sadler et al. [49] in adolescents showed that the exposure to 'unhealthy food outlets' between home and school had a significantly increased likelihood of purchasing junk food. The downside of these studies is that they did not explore the relationship of the exposure and use of food outlets, nor whether the stores classified as 'unhealthy' were the major contributor source of the increased purchase and consumption of junk food. It is worth noting that the so called "healthy" food outlets like supermarkets and "unhealthy" food outlets like convenience stores can sell or serve both healthy and unhealthy food options or portions [27]. This study set out to i) determine an individual's AS and explore both individual and environmental factors influencing AS, ii) examine how RDFE represent AS, iii) investigate the accuracy of RDFE in representing actual exposure, and iv) whether exposure to food outlets is associated with use.

Methods
This study was undertaken between June and December 2018. It included participants residing in Leeds, UK and its surrounding areas, aged 18 years old and above, with proficient digital skills (e.g. smartphone, laptop or computer) and access to the internet. A non-representative volunteer sample of 76 participants enrolled for this study through poster advertisements in Leeds UK. Three participants withdrew before data collection and eight participants were excluded due to errors in their GPS data resulting in a final sample of 65. The study received ethical approval from Leeds Beckett University Research Ethics Committee (ref. 37629).

Individual characteristics data
Participants completed an online questionnaire via Qualtrics to capture their socio-demographic and socioeconomic status (SES) characteristics, including age, gender, occupation, income, residential postcodes and workplace addresses. Note, home addresses were considered as sensitive information and not collected. Residential postcodes were used to generate Index of Multiple Deprivation (IMD) ranks using an online IMD postcode lookup (http:// imd-by-postc ode. opend ataco mmuni ties. org/) and grouped into IMD quintiles. Home postcodes were geocoded to postcode zone centroids to protect participants identity while workplace and food purchase locations were geocoded at the address level. The geocoding was done using ArcGIS. Participants weights (kg) and heights (m) were measured, and participants' Body Mass Index (BMI) used to classify them as 'underweight' (BMI < 18.5 kg/m 2 ) 'healthy' (BMI = 18.5-24.9 kg/ m 2 ), 'overweight' (BMI = 25-29.9 kg/m 2 ) or 'obese' (BMI ≥ 30 kg/m 2 ).

Food purchase location and dietary data
For seven days, participants were asked to keep receipts of all their food purchases. The receipts provided addresses of food outlets visited, date and food items purchased. If participants forgot to collect receipts or food outlets did not provide receipts, participants were asked to record the name of the food outlet, date and food items purchased on a piece of paper whenever possible. The addresses of food venues were geocoded (converted into XY coordinates) and used in geospatial analysis while other information such as type and amount of food was used for validation of dietary intake.

Activity space
Participants were issued GPS tracking devices and asked to wear them every time they moved outside their residential places for a period of seven days including week and weekend days. The study used Garmin Foretrex 301 and 401 devices which were set to record GPS points every 30 s. The collected GPS data were used to create a buffer of 1 km on both sides of the road network which was considered as an AS (GPS-based AS). This buffer size was considered a reasonable estimate of how far a person would walk [28,65] and captures the adjacent area to individual daily paths. According to Li and Tong [35], ' Activity spaces are geographical measures of the locations, paths, and areas adjacent to where people go to carry out their daily lives' . Christian [8], Sadler and Gilliland [48] and Zenk et al. [65] for example created 0.5mile (0.8 km) around all GPS points and dissolved them into a single feature or space to represent the adjacent areas around participants' daily path. Likewise Sherman et al. [52] and Kerr et al. [28] used 1 km buffers in their studies. A systematic review by Smith et al. [53] also included several studies that used daily path areas-buffers of all points or tracks.

Researcher-defined food environments (RDFE)
In this study, Researcher-Defined Food Environments (RDFE) refer to different types of buffers (e.g. circular and street network buffers) frequently used by researchers as a measure of an individual's food environment. A variety of RDFE were created using three buffer sizes 2 km, 4 km and 6 km. These buffer sizes have been used in previous studies and in this study they were used to allow comparisons [24,28]. Table 1 shows graphical representations of each RDFE. Standard Deviational Ellipses (SDE) were also created to determine the dispersion of GPS points (participant's movements) and the orientation of participants' mobility and AS. According to Wang et al. [58], SDE are useful in identifying a dispersion or concentration and orientation of spatial features. In this study, 2 SDE were created to capture 95% of all GPS points of participants movements. The 2 SDE were used to assess any pattern on participants' movements that could influence the shape (orientation) of their AS. Using visual inspection, the orientations of 2 SDE were assessed for agreement with home-workplace direction and recorded as a binary variable (yes/no) (see Fig. 1).

Points of interest and road-network data
This research also used PoI and Road-Network (RN) data of 2018. The PoI data set was used to identify and map all food outlets in the study area. The PoI data were obtained from the Edina Digimap website (http:// digim ap. edina. ac. uk). The PoI features codes were decoded into features names and cleaned to remove all non-food outlet features. The features retained in the PoI data included supermarkets, convenience stores, fast-food outlets, and others (i.e. cafés and coffee shops, speciality food stores, pubs and inns, grocery stores, hotels and restaurants). Identical outlets were identified and removed resulting in a final sample of 20,306 outlets. The PoI data has been validated and showed to be accurate for classifying food outlets [62]. The RN data provided comprehensive data on all major and minor roads except private roads in the study area. The RN data were also downloaded from the Edina Digimap website (http:// digim ap. edina. ac. uk) and converted into a road network dataset before data analysis.

Data analysis
Statistical and geospatial analyses were undertaken in SPSS version 25 and ArcGIS 10.2, respectively. Tests of normality were conducted to guide a choice of the appropriate test for the analysis. The assumption of normality for AS (continuous variables) was not satisfied as assessed by Shapiro-Wilk's test (p > 0.05) and by visual inspection of Normal Q-Q Plots. Due to violation of normality assumption, non-parametric tests Mann-Whitney U and Kruskal-Wallis H were used to examine mean differences of between groups of binary variables and Kruskal-Wallis H was used for variables with more than 2 categories. Subsequent pairwise comparisons with Bonferroni adjustment for multiple comparisons were used to assess the difference in AS between groups. Multiple linear regression analysis was performed to assess the relationships between participants' SES, sociodemographic characteristics, Euclidean distance from home to workplace and AS size. Overlap between RDFE and AS was calculated in ArcGIS as a proportion of intersection and presented as mean percentage overlaps and analysed further in SPSS to determine the agreement between RDFE and AS. The Wilcoxon paired rank test was used to explore differences in sizes of food environment captured by the RDFE and AS. The accuracy of the RDFE to represent exposure was assessed using Positive Predictive Value (PPV) and sensitivity. PPV is the proportion of positive results that are true positives, calculated as True Positive [TP] / (True [17]. The PPV in this study denoted the proportion of food outlets within the RDFE that were truly present in an individual's AS. The TP value represented the outlets in the AS that were correctly captured by RDFE and the FP value denoted the outlets captured by RDFE which were not within the AS (Fig. 2). Sensitivity on the other hand measures the proportion of actual positive results that are correctly identified by a measure, calculated as True Positive (Fletcher et al. 2012). FN denoted the outlets within the AS which were missed by RDFE (Fig. 2). The value of both PPV and sensitivity range from 0 to 1 (i.e. 0-100%).
In this study, exposure referred to food outlets within a defined environment and use referred to the food outlets in which the participants purchased food. Table 2 displays the full characteristics of the sample. Participants were aged between 19 and 67 years old (mean = 37 ± 13 years) ( Table 2). The majority of   (22) £10,000-< £25,000 22 (34) £25,000-< £50,000 24 (37) £50,000-< £100,000 3 (5) £100,000 and more 1 (2) Household annual income Under £10,000 11 (17) £10,000-< £25,000 10 (15) £25,0000-< £50,000 27 (42) £50,000-< £100,000 15 ( participants were female (72%) which could be because study participation was voluntary and the study could have more appeal to women than men. This is not uncommon in health promotion research and programmes in which men tend to have lower participation than female overall [36,42,46]. More than half of the participants (60%) were White, nearly three-quarters (71%) were educated to degree level or higher, and 78% were in full-time employment. Most of the participants (92%) worked away from home and 60% used a car to commute from home to their workplace. Likewise, the high proportion of highly educated and full-time employed participants could be due to the voluntary participation in the study and the appeal of the study to these groups. More than half of the participants (63%) had a healthy weight. The BMI of participants ranged from 18 to 37 kg/m 2 with a mean BMI of 25 kg/m 2 .

Aim 1: Determine an individual's AS and explore both individual and environmental factors influencing AS
The average AS size was 62 (min = 6, max = 284 km 2 ), and Euclidean distance from home to workplace 5 km (min = 0, max = 21 km) ( Table 1). Most movements of participants working away from home were between home and workplace (Fig. 3).
Assessment of the orientation of AS using 2 SDE of participants' movements revealed that the AS of 60/65 (82%) of participants who worked away from home followed home-to-workplace direction and most of the concentration of GPS points for these participants was in the space between home and workplace (Fig. 3).
The Mann-Whitney U test showed that median AS size was statistically significantly smaller in individuals using active transport (walking and cycling) than in individuals using motorised transport (car, bus, and train) (p = 0.011). No statistically significant difference in AS size was detected between gender (p = 0.482), employment status (p = 0.285), shifts (p = 0.146) and working hours (p = 0.165) groups ( Table 3). The Kruskal-Wallis test showed statistically significant difference in AS between the different age groups, χ 2 (4) = 18.77, p = 0.001. The post hoc analysis showed statistically significant differences in median AS between participants aged 18-20 years and 41-50 years (p = < 0.001) as well as 21-30 years and 41-50 years (p = 0.002). (Table 3).
Regression analysis showed that for every 1 km increase in the Euclidean distance from home to work there was an increase of 34.04km 2 of AS (CI:13.19, 54.90, p = 0.002). No individual characteristics were predictive of individual AS size (Table 4).

Aim 2: Examine how RDFE represent AS
All RDFE explored in this study had limited accuracy in representing AS ( Table 5). The mean percentage overlap between RDFE and AS varied according to the type and size of the buffer used. For instance, small buffers of 2 km had very low percentage mean overlap such as CBH (16.6%), CBW (14.6%), CBHW (27.5%), SNBH (8.5%), SNBW (7.8%) and SLB (40.7%). The percentage mean overlap showed a direct relationship with the buffer size, meaning that the percentage mean overlap increased as the buffer size increased. To note, circular buffers around the home or workplace alone underestimated AS by more than 50%. Underestimation was less severe for buffers involving both home and workplace (e.g., CBHW and SLB) but was still significant.  (Table 6). Table 7 shows that in all RDFE there was a decrease in PPV as buffer size increased. In 2 km CBH for instance, the PPV for all food outlets were ≥ 0.75 which dropped to > 0.5 and ≤ 0.5 with 4 km and 6 km   A total of 250 food outlets visitations were made during the study period of which more than half (54%) were supermarkets. Convenience store (16%) and fast-food outlets (9%) were the outlets least visited. In contrast, participants had the highest exposure to 'other' food outlets in their AS (53%) followed by fast-food outlets (25%) and the least exposure was to supermarkets (2%) ( Table 8).

Discussion
The findings of this study revealed that AS size was directly related to the distance from home to workplace. These findings agree with the study by Drewnowski et al. [15] which found that GPS-based AS size had a positive association with distance from home to work. Our study also showed that the orientation of AS for most of the participants working away from home followed the hometo-workplace direction, which signifies the importance of the two locations in an individual movement. These findings align with the findings of studies on spatial analysis of the AS in which most of the activities of participants were undertaken around activity anchor points [22,33,57]. Saxena and Mokhtarian [50] also found that most activities of telecommuter (individuals who work from home for an organisation) were carried out around home on telecommuting days while most of the destinations were oriented toward the workplaces on commuting days. These findings have an important implication during and post COVID 19 when working from home has and may become more common [3,55]. Researchers need to consider the change in individuals AS patterns considering where individuals spend most of their day when modelling their AS. It was also found in this study that younger participants aged less than 30 years had smaller activity space compared to those aged 41-50 years. Similarly, the study revealed that the manager and the professional occupational group had larger activity space compared to the manual and routine worker group. This could be influenced by the fact that most of the younger participants were students who lived near their workplaces while the older participants were senior employees who lived further from their workplaces meaning that participants younger than 30 years could have a shorter commute than the older participants.
The study also found a positive correlation between the participants' mode of transport and the AS in which participants using active transport (e.g. walking or cycling) had smaller AS compared to those using passive transport (e.g. car and bus). Similar findings were reported in the study by Zenk et al. [65]. Despite differences in activity space size among different age, income occupation groups and mode of transport, the study did not find any association between the size of activity space and other social demographic characteristics and SES. Similar findings were observed in a study by Drewnowski et al. [15] which found no association between the size of AS and other social demographic characteristics and SES. Similar findings were observed in a study by Drewnowski et al. [15] which found no statistically significant relationships between the participants' sociodemographic characteristics and the areal size of their AS. Literature suggests that AS provides a more realistic representation of individual exposures to food outlets compared to researcher-defined measures such as buffers [8,21,65]. When RDFE was superimposed on AS-a reference measure of "actual" exposure-to determine the percentage overlap of the two, all RDFE misrepresented the AS and the percentage mean overlap between the RDFE and AS varied considerably across different RDFE type and size. These findings agree with a study by Sadler and Gilliland [48] on 526 children using a GIS-based analysis of individuals' GPS tracks AS and different proxies for AS like buffers and container approaches to quantify the discrepancies resulting from the use of different proxy methods. The study showed that exposure proxies  consistently underestimate exposure to junk foods by up to 68%. The analysis also showed that the RDFE which included both home and workplace such as SLB and CBHW had a better representation (overlap) of AS compared to those which considered home (i.e. SNBH and CHB) or workplace (SNBW and CHW) separately. Thus, the use of RDFE which focuses on home or workplace separately has potential for errors in estimating AS unless both home and workplace are considered. Here state how much and give examples of studies that have done this and the spurious conclusions they have made.
In future research where it is not possible to capture individual AS using GPS data, home, workplace and commute could be used to model individuals' AS. The current study suggests a 4 km buffer may provide a fair representation of exposure, however, further research is needed to establish an optimum buffer size that captures exposure for different populations.
Furthermore, the study confirmed that commonly used measures of the food environments (i.e. RDFE) lack precision and accuracy representing exposure. PPV for all RDFE were inversely related to the size of the RDFE. Although smaller buffers had a high probability of capturing food outlets that were truly present in the AS, they missed a considerable number of food outlets in the AS, therefore underestimating exposure. Conversely, although the larger RDFE captured more food outlets, they significantly included both non-AS and many falsepositive food outlets, leading to a lowering of their PPV.  The type of RDFE also mattered when it came to the PPV of the RDFE. The street network buffers, for instance, had slightly higher PPV compared to circular buffers. Street-network buffers may more closely reflect the on-the-ground context; by excluding non-activity areas they reduce false-positive food outlets (Fig. 4) [5,26,41]. Circular buffers are considered to be less representative of the "actual" relevant spatial context of places, especially in locations with natural features like water bodies or other features like railways [41]. Yet, circular buffers remain the most commonly used buffers in food environment research [32,63].  The inclusion of both home and workplace in the RDFE (e.g. CBHW and SLB) significantly improved the sensitivity of RDFE, but not the PPV. This supports a huge body of literature indicating powerful exposures beyond home neighbourhoods [11].
In quantifying how much the RDFE insufficiently represents exposure in the food environment these findings have significant implications. Policymakers need to interpret existing evidence with caution when making decisions about modifying food environments. Meanwhile, researchers in the food environment field ought to be aware of how much exposure is missed out and traded-off when using certain types or sizes of RDFE. Future studies should include both home and workplace locations to improve the representation of the food environment and exposure to food outlets.
Despite high exposure to 'other' and 'fast-food' outlets, participants in this study used these food venues relatively rarely; most food purchases were made at supermarkets, whereas they made the fewest food purchases from fast-food outlets. These findings align with Appelhans et al. [1] who found supermarket stores to be the most visited food outlet while fast-food/takeaway outlets were the least visited food venue. The analysis also revealed that exposure to food outlet was not related to their use. This suggests that mere exposure to a certain type of food outlet may not necessarily lead to the use of those facilities [16]. The mechanism by which exposure influences use is likely to be more complex than has been suggested by most contemporary research. Glanz et al. [19] for instance highlighted taste, cost, convenience, variety and energy density as some of the key determinants of food choices. These factors may be objective, subjective or both, powerfully influencing individuals' food choices and purchase locations.
Several studies in the food environment suggest that exposure to certain types of food outlets increases the likelihood of adiposity [4,6,7,30]. These studies operate under the assumption that some food outlets are 'healthy' while others are 'unhealthy' [20]. Often, fast-food outlets, takeaways and convenience stores are classified as 'unhealthy' , whereas grocery stores and supermarkets are considered 'healthy' [20]. This stratification of food outlets is overly simplistic and problematic, it fails to recognise the wide array of unhealthy food options offered within most supermarkets while disregarding the healthy food options available at most fast-food outlets [20]. A study by Lesser et al. [34] for instance, demonstrated that the average amount of calories purchased by participants at McDonald's (1,038 cal)-considered as 'unhealthy'was the same as the calories purchased at Subway (955 cal)-considered as 'healthy' . The so-called 'health halo' [14,34] can spuriously imply that certain specific food outlets are 'healthier' . Importantly, the 'healthiness' of food outlets is determined by the actual food offered by the food outlets [20].
It is important to consider that a limitation of the current study is the small and non-representative sample (i.e. majority of the participants were female). Although these findings are not generalisable, the study managed to collect the quality and rich individual-level data that permitted exploration of the food environment using a wide range of buffer types and sizes. Coupled with multiple food environment measures, this distinctive combination of variables offered valuable comparisons between these metrics and added insights on the discrepancies existing in the RDFE measures. Relying on a cross-sectional design also limits the causality that may be drawn from the findings. Even allowing for this shortcoming, the findings justify serious consideration on the effects of using different measures of food environments on the exposure in the food environment.

Conclusion
This study has highlighted the importance of home and workplace locations in the food environments and the limitations of the RDFE in representing an individual's food environments and exposure to food outlets. Different RDFE had varying accuracy of representing exposure in the food environment. Fundamentally, the majority of RDFE misrepresent exposure. Therefore, it is advisable for researchers using datasets without individual mobility information-GPS tracking (e.g., secondary data) to consider including both home and workplace for individuals working away from home when modelling AS or food environments in their analysis. Over-dependence on conventional ways (proxy measures) of measuring exposure, which are obviously incomplete, proposes an under-developed appreciation of how the environment influences weight status. Moreover, exposure to food outlets was not a good determinant of their use. Clearly, the relationship between exposure and use is more complex than is currently suggested in both empirical and policy literature. With an increased impetus to modify the food environment, policymakers ought to be cautious when interpreting the current evidence in this field which is greatly based on the RDFE. We suggest a shift of focus in the food environment field from mere exposure to food outlets to more nuanced factors like the quality of food offerings by food outlets and the quality and quantity of food purchased and consumed by people.