This study evaluated intra-rater reliability, inter-rater reliability and criterion validity of a Google Street View-based audit to virtually assess physical environmental characteristics along cycling routes to school among 11-to-12-year old children. Because in future studies, the audit instrument will also be used to assess environmental characteristics for the entire cycling routes and not just for segments, we opted to analyze the reliability and validity at the level of the entire cycling route. Overall, 78% of all items of EGA-Cycling generated high intra-rater reliability and the inter-rater reliability was acceptable for 43% of all items. Acceptable criterion validity between the ratings by Google Street View and the on-site ratings was reported for 54% of all items.
The reliability results found in the present study are comparable with previous studies [32, 52]. Griew and colleagues  rated the neighborhood area, including street design factors related to walking among adults, with a newly developed street audit using Google Street View. In line with our intra-rater results, they found high intra-rater reliability scores for all street characteristics. Overall, studies evaluating audit tools to assess the physical environment stressed the difficulty to judge on quality or aesthetics [21, 53]. Griew and colleagues  found low agreement between raters for pavement quality, lighting and road permeability, also indicating that judgment on quality or aesthetics differed between raters due to subjectivity. The same conclusion was made in a study of Kelly and colleagues  when evaluating the inter-rater reliability of the Active Neighborhood Checklist using Google Street View. They found low scores for parking facilities, tree shade, sidewalk width and curb cuts. In our results, divergent scores were also found, ranging from no to almost perfect inter-rater reliability. Low inter-rater reliability scores (found for “openness view”, “presence of driveways”, “presence of garage doors”, “type of cycle lane”, “width of cycle lane”, two-way cycle lane”, “maintenance of cycle lane”, “lighting of cycle lane” and “maintenance of front yards”) could be explained by the subjective interpretation of the observers. A clear definition of those items and their response options was difficult to provide, so ratings between both observers could differ. Therefore, training with specific instructions and examples of different environments to interpret and rate those subjective items has to be provided for different observers. Regarding the type of cycle lane, observers received no clear instructions for scoring this item when different types of cycle lanes occurred in the same street segment. They had to choose one specific type of cycle lane that fitted the best in the street segment, mostly depending on their interpretation. Adapting the response options for this item (e.g. adding “mixed type of cycle lane” or multiple response options) could be a possible solution to increase inter-rater reliability.
Furthermore, little variance in the answers could explain low inter-rater reliability scores for residential mix and swerving alternatives for cyclists, as the percentage agreement for both items was generally high (>70% percentage agreement).
When validating the ratings by Google Street View against the audit filled out by on-site assessments, mixed results were found. Large variations in the criterion validity scores were reported, however with acceptable values for approximately 54% of all items. Of all low-scored items, 4 items (31%) showed high percentage agreement (>70.0%), indicating low variance in the items regarding presence of recreational destinations, streetlights in the street segment, two-way cycle lane and presence of trees. Since high percentage agreement was found for those items, acceptable criterion validity between the different answers can be assumed.
Our results showed somewhat lower criterion validity scores compared to similar studies that conducted virtual audits in a neighborhood area [26, 27, 30, 52]. However, only Badland and colleagues  included features specifically related to cycling (e.g. “path type, slope, curb type and condition of cycle lane”, “one-road cycle lane”). In our study, the majority of all low-scored items was reported for cycling facilities compared to the other subscales. All items categorized under cycling facilities, except one (“surface of cycle lane”), had poor or fair validity. In contrast, Badland and colleagues  reported a high criterion validity score for the items related to cycling. However, they included all individual items in one category (“cycling surface”) and calculated the agreement for the category and not for the individual items. Additionally, the majority of those similar studies found lowest validity scores for qualitative and detailed features [26, 27, 29, 30], street condition features [27, 30], and changeable items like presence of graffiti and litter [26, 27, 29, 30]. In our study, some low-scored validity items (“openness of the view”, “presence of driveways”, “maintenance of cycle lane”, “lighting of cycle lane”) were assessed through a qualitative judgment, so subjective interpretation of the items by the observers could explain those low scores. Additionally, the perspective of the camera when Google Street View images were captured makes it sometimes difficult to observe more detailed features . This could also explain the low scores for the items regarding swerving alternatives for cyclists, width of the cycle lane and path condition of the cycle lane. For example, the path condition of the cycle lane was easier to rate when going on-site and experience it by actually cycling the routes, than rating it through Google Street View.
Another possible explanation for low criterion validity scores (for the items “measures that can slow down traffic”, “type of cycle lane”) could be that the Belgian virtual images in Google Street View dated from 2009, while on-site assessment was conducted in winter 2013. Similar studies did not only highlight the difficulty to audit temporal items (e.g. graffiti, litter,…) [27–30, 32, 33], but did also report the temporal lag between the Google Street View images and the on-site assessments as a limitation to use Google Street View [27–29, 54]. There is no fixed frequency in which new Google Street View images are collected, however, an update of the virtual images appears once every 5 years. The date when images are taken is shown in Google Street View. Additionally, Google Street View provides information on when and where new images will be taken . This enables the possibility to select areas, where the Google Street View images are not out dated and to focus on these areas for certain research purposes. Curtis and colleagues  investigated the spatio-temporal stability in the Google Street View dates. They concluded that the dates of the images often changed and without warning. For example, images of Google Street View can be presented for one date and can suddenly change to images from another date. Those changes mostly occur at intersections. So, when using Google Street View as a data collection tool, researchers should be aware of these issues. Additionally, the new history function of Google Street View provides the possibility for the user to travel to the past to see how a place has changed over the years . Google Street View gathered historical imagery from past Street View collections dating back to 2007. This function allows identifying changes in the physical environment, which might be of interest in some studies. According to the Flemish agency for roads and traffic, many infrastructural changes in the Flemish traffic landscape (e.g. construction of new cycle lanes) were conducted after 2009 . So recent changes could not be observed in Google Street View, while on-site ratings showed for those items other and new infrastructural elements (e.g. separated cycle lane, speed bump).
So, actually cycling along the routes and observing by on-site assessment are the preferred method to assess features related to the micro-environment (e.g. cycle lane condition) and new infrastructural features (e.g. separate cycle lanes not allowing car traffic). However, for the other items conducting the audit through Google Street View remains beneficial since there is a large gain in time (including travel and rating time). Traveling to and from the different cycling routes requires more effort and time when observing the environment by on-site assessments compared to observations using Google Street View. An additional added value of Google Street View is the fact that when items are unclear, they can easily be double checked while it requires much more time and effort to go back to the specific location through field observations.
For many items a constant response was recorded and it mostly appeared in the Google Street View ratings [see Additional file 2]. Difficulties to see the presence of some detailed features with Google Street View, for example the maintenance of the street segment, could explain this. This may demonstrate that assessing the physical environment through Google Street View, especially for more detailed and qualitative features, may give less nuanced results. However, two items had a constant response given by all observers (“maintenance buildings” and “presence of graffiti and litter”). For observing the physical environment in substantial regions, it is suggested to rate these items in other regions. Although a study in the Netherlands by de Vries and colleagues  found that litter was not associated with cycling to school among elementary school children, removing those items from the audit representing all environments could be premature. The presence of graffiti and litter may nevertheless influence cycling behavior in other regions .
The present study has some important limitations. One limitation involved the small number of raters that conducted EGA-Cycling, which may affect the reliability of the results. Secondly, conducting the study only among 6th graders limits generalization to all primary school children. The study also included only one school situated in a suburban area. Third, the cycling route to school from each child was obtained through the parents. However, the actual cycling route that children take to school may differ from what parents consider as the actual route, especially in older and more independent children. Future research could use GPS devices to track in detail the actual routes that children take to school or in leisure time.
The present study has some important strengths. To our knowledge this is one of the first studies that tested both intra- and inter-rater reliability, and added the criterion validity of a newly developed Google Street View-based audit focusing on cycling routes to school. Google Street View provides many advantages to assess the physical environment. It is an objective method, cost and time effective, always available and does not have to take weather conditions into account. The present study can provide direction to research that assesses the physical environment along cycling routes. To assess macro-environmental features along cycling routes to school, EGA-Cycling is a helpful instrument. However, to assess environmental features on a micro-level in a cycling setting (detailed and temporary features specifically related to cycling), on-site assessments should be added to the observations through Google Street View. Furthermore, it is of interest that future research continues to evaluate the use of Google Street View to assess the physical environment across other settings and other populations.