Virtual audits of the urban streetscape: comparing the inter-rater reliability of GigaPan® to Google Street View

Background Although previous research has highlighted the association between the built environment and individual health, methodological challenges in assessing the built environment remain. In particular, many researchers have demonstrated the high inter-rater reliability of assessing large or objective built environment features and the low inter-rater reliability of assessing small or subjective built environment features using Google Street View. New methods for auditing the built environment must be evaluated to understand if there are alternative tools through which researchers can assess all types of built environment features with high agreement. This paper investigates measures of inter-rater reliability of GigaPan®, a tool that assists with capturing high-definition panoramic images, relative to Google Street View. Methods Street segments (n = 614) in Pittsburgh, Pennsylvania in the United States were randomly selected to audit using GigaPan® and Google Street View. Each audit assessed features related to land use, traffic and safety, and public amenities. Inter-rater reliability statistics, including percent agreement, Cohen’s kappa, and the prevalence-adjusted bias-adjusted kappa (PABAK) were calculated for 106 street segments that were coded by two, different, human auditors. Results Most large-scale, objective features (e.g. bus stop presence or stop sign presence) demonstrated at least substantial inter-rater reliability for both methods, but significant differences emerged across finely detailed features (e.g. trash) and features at segment endpoints (e.g. sidewalk continuity). After adjusting for the effects of bias and prevalence, the inter-rater reliability estimates were consistently higher for almost all built environment features across GigaPan® and Google Street View. Conclusion GigaPan® is a reliable, alternative audit tool to Google Street View for studying the built environment. GigaPan® may be particularly well-suited for built environment projects with study settings in areas where Google Street View imagery is nonexistent or updated infrequently. The potential for enhanced, detailed imagery using GigaPan® will be most beneficial in studies in which current, time sensitive data are needed or microscale built environment features would be challenging to see in Google Street View. Furthermore, to better understand the effects of prevalence and bias in future reliability studies, researchers should consider using PABAK to supplement or expand upon Cohen’s kappa findings.


Background
Research has shown a connection between the built environment (BE) where people live, work, and play and their physical, social, and mental health. A systematic review of the built environment and cardio-metabolic health found strong evidence of the association between the BE and a person's physical health [1]. Leyden [2] found living in walkable, mixed-use neighborhoods was associated with greater social capital including a greater likelihood of trusting other people, getting to know neighbors, and involvement in one's community. Urban neighborhood BE characteristics such as housing quality, exposure to greenspace, and other environmental conditions are also associated with psychological distress [3]. Although the body of evidence supporting the connection between the BE and health continues to grow, studying the BE continues to present unique methodological challenges.
In-person direct observation (DO) has been considered the gold standard when auditing features of the microscale BE [4]. The microscale environment is defined as built and social environment features representing neighborhood characteristics or details that are smaller in scale and are generally more likely to change over time with fewer costs [5]. This includes street-level environmental features like housing characteristics, sidewalk presence and conditions, street lighting, traffic control characteristics, intersection features, tree coverage, curb characteristics, graffiti, and trash. Similar to other researchers, we classify some microscale features as "finely detailed" [6]. This refers to features that are visually fine as a whole (e.g. presence of garbage, litter, or broken glass, presence of broken windows or bars on windows). Although DO is the gold standard in assessing the microscale environment, using DO can be costly and time intensive depending on the location and the size of the area being observed [7]. These limitations are especially problematic when the areas of interest are geographically dispersed across various political or administrative divisions (e.g. states, provinces, prefectures) or countries.
Google Street View (GSV) has been used as a reliable tool to observe the microscale BE and is a cheaper alternative to DO [6,8]. GSV provides an individual with a panoramic, 360° view from a selected street, with the ability to move along the street on a computer and adjust zoom settings. While GSV has been used reliably to study BE features, it has also presented its own set of limitations. This includes limitations that are dependent upon the method used to assess features in the GSV imagery (e.g. human auditors vs. deep-learning technologies) and limitations that are relevant regardless of the method used. In one study that used human auditors to code features in GSV imagery, reliability was not as high when considering finely detailed features, such as the presence of litter, or when recording qualitative observations, such as the quality of sidewalk or housing [6]. Using human auditors to assess streetscape characteristics can also be subjective and costly for large-scale studies [9].
Deep-learning technologies have advanced our abilities to widely and objectively assess street features by making use of the pixels in GSV imagery. Studies have used machine learning to parse pixels of GSV imagery into different categories (e.g., sky, trees, and buildings) to generate very precise estimates of features [9,10]. For example, Yin and Wang [9] generated the proportion of sky in GSV images and found it was negatively correlated with pedestrian activity and walkability. Researchers have also used GSV and a combination of other 2D and 3D data sources to make 3D GIS models and examine microscale urban design characteristics related to physical activity and pedestrian behavior [11]. Building 3D GIS models allows the user to interactively assess details of the streetscape from many angles, points, or locations [11]. Therefore, these models may improve researchers' abilities to objectively assess features of the urban streetscape. 3D GIS models can be built using open source software and there have been many advances in procedural modeling, computer vision, and photogrammetry that make this process easier [11]. However, the application of these principles requires knowledge of artificial intelligence (e.g., machine learning, computer vision) and the computational skills and capacity to implement them.
Furthermore, although GSV is available for various cities throughout the world, GSV imagery is not available for every street in many countries, including many developed countries [12]. In the United States (USA), imagery is updated irregularly, with urban areas tending to have more complete coverage and more frequent updates than rural areas [6]. These limitations could prove problematic for studies in which the imagery date is crucial to the study's aims, for studies across varying levels of urbanicity/rurality, or for studies in countries with sparse imagery availability. Another problem unique to GSV is the variation in imagery dates on a single street segment. As the auditor navigates the prevalence and bias in future reliability studies, researchers should consider using PABAK to supplement or expand upon Cohen's kappa findings. street in GSV, the latest imagery available may vary for different portions of the segment. It can be challenging to control for this variation; and therefore, the fluctuating dates have the potential to introduce error into an audit that is meant to represent the BE at a specific point in time [13].

Keywords: GigaPan
Using GigaPan ® to audit the BE is one potential solution to some of the shortcomings of using GSV or DO for BE audits. GigaPan ® is a tool that assists in capturing high-definition panoramic images, and its use auditing the BE is underexplored. More specifically, GigaPan ® is a robotic camera mount that can be used with most digital single-lens reflex cameras to capture panoramas composed of billions of pixels. After placing a camera on the GigaPan ® apparatus, the GigaPan ® apparatus is then attached to a tripod to allow for ease of use and to improve the vantage point. The Giga-Pan ® mount maneuvers the camera to take hundreds to thousands of photographs of the designated area. Then, the images are uploaded and are stitched together using the GigaPan ® Stitch Software that is downloaded onto a computer. This results in a detailed, enhanced panorama with increased zoom and improved resolution. Compared to GSV, GigaPan ® allows for greater control of the temporal aspect of capturing and using images in real time as the researcher is responsible for deciding when and where images are taken. GigaPan ® has been used for rangeland monitoring of natural resources in ecological studies, for analyzing the community structure of ants in Costa Rica, and by the National Aeronautics and Space Administration for planetary analog field experiments [14][15][16].
In recent research, our study team examined the reliability of GigaPan ® as a method for assessing park characteristics, as well as the validity of using Giga-Pan ® to measure park attributes in comparison to GSV and DO [17]. This research showed GigaPan ® was a reliable method for collecting data on park attributes and a comparably valid method to GSV and DO [17]. Research by our study team also documented the validity of GigaPan ® imagery in assessing street segment BE attributes [18]. Using DO as the gold standard, our findings for street segments showed GigaPan ® audits obtained comparably valid results relative to GSV [18]. However, the reliability of using GigaPan ® to document street segment BE characteristics has yet to be established. We hypothesized GigaPan ® may be better suited for measuring microscale BE features compared to GSV. Given the potential benefits of using GigaPan ® to capture the BE, this study seeks to ascertain and compare the inter-rater reliability (IRR) of BE constructs coded by multiple, human auditors using GigaPan ® and GSV imagery.

Study sample
This study is an extension of the Pittsburgh Hill/Homewood Neighborhood Change and Health (PHRESH) study. The PHRESH study is led by RAND with the University of Michigan as a collaborator. It is an ongoing community-focused research study examining connections between features of the built and social environment and health in two, low-income neighborhoods (the Hill District and Homewood) in Pittsburgh, Pennsylvania, USA whose residents are predominately African American. Additional details regarding neighborhood selection can be found elsewhere [19]. This GigaPan ® and GSV ancillary study was led by the University of Michigan to document street-scale features using GigaPan ® that could be associated with walking, physical activity, and obesity. Briefly, RAND collected and electronically stitched the images to complete the GigaPan ® panoramas, while the University of Michigan used human auditors to audit the GigaPan ® panoramas, the GSV imagery, and analyze the data.
To assess the street environments in these two neighborhoods, a random sample of 614 unique street segments, approximately 25% of the street segments in each of the two neighborhoods, were audited. Of the 614 street segments, 20% (n = 124) were randomly selected, coded by two raters, and included in this study of IRR. Thus, the reliability sample represents both neighborhoods: Homewood and the Hill District. All selected street segments were audited using GSV imagery and GigaPan ® imagery. Both types of imagery were coded by human auditors who reviewed the imagery and determined the presence, quality, or quantity of a BE feature along the streetscape. Details regarding image capture and coding are described below.
Capturing street imagery using GigaPan ® RAND field staff recruited and trained neighborhood residents of Pittsburgh, Pennsylvania to capture images of the selected street segments using the GigaPan ® apparatus. Before entering the field, staff completed training that required them to read the GigaPan ® manual, watch a demonstrational video on the use of the GigaPan ® apparatus, and take practice photos to become comfortable using the GigaPan ® equipment.
The GigaPan ® apparatus is a camera mount consisting of a panoramic tripod head that holds and stabilizes a digital camera of the user's choice. In this study, we mounted a Canon ® PowerShot S120 camera onto the GigaPan ® Epic apparatus and secured it on a tripod (see Fig. 1). With the camera secured on the tripod, the user aimed the device to the upper-left corner and lower-right corner to set the boundaries of the area they wanted to capture. The built-in software of the GigaPan ® apparatus then computed the number of images needed to complete the panorama. Next, the user pressed the shutterrelease button and the apparatus captured the individual images which were later assembled into a larger panoramic image using the GigaPan ® Stitch Software on a computer.
RAND field staff took a GigaPan ® panorama for both sides of the selected street segments. After the images were taken, RAND used the GigaPan ® Stitch Software to electronically stitch the individual images into a high-resolution panorama. A resulting panoramic image for a non-study location can be seen in Fig. 2. The panoramic GigaPan ® photos were then sent to the University of Michigan so that trained, human auditors could code various BE features identified within each panoramic photo. Fig. 1 Canon PowerShot ® S120 camera mounted on the GigaPan ® Epic and secured on a tripod Fig. 2 A GigaPan ® panorama for a non-study street segment in Ann Arbor, Michigan, USA. The panorama is the result of many smaller images that were stitched together electronically using the GigaPan ® Stitch Software. The individual images composing the panorama were captured by a Canon ® PowerShot S120 mounted on a GigaPan ® Epic that was secured on a tripod. The number of pixels in this image was reduced dramatically to meet journal file size requirements. Additional interactive photos shared by other users of GigaPan ® can be accessed on the GigaPan ® website via URL [20]  Coding the GSV and GigaPan ® imagery University of Michigan auditors performed GSV audits using Google Earth Pro version 7.3. All study street segments were saved in the form of KMZ files for use in Google Earth. Each street segment had a unique segment ID that served as the identifier. Example GSV imagery corresponding to the non-study street segment shown in Fig. 2 can be accessed via URL [21]. To document streetscale features that could be associated with walking, physical activity, and obesity, the University of Michigan auditors coded the GSV imagery and GigaPan ® imagery separately using a modified, electronic version of the Bridging the Gap Community Measures Project (BTG-COMP) auditing tool [22]. This tool is formally known as the BTG-COMP Street Segment Observation Form and was designed to assess neighborhood street-level features related to physical activity. Based on previous pilot study research that used in-person DO as the auditing method, the tool had good reliability [23]. Additional details regarding the selection of the measures included in the tool can be found elsewhere [23]. In brief, a socioecological framework guided the overall development of the original BTG-COMP auditing tool, and empirical literature and an expert panel were used to identify features of the neighborhood environment associated with physical activity. The tool focused on features that could not be assessed using available national data. Moreover, our study team made modifications to the original tool to fit the local context of the two neighborhoods we audited. For example, we added a question about the slope of the segment because the Hill District and Homewood neighborhoods are hilly and the steepness of the street segments was expected to be related to neighborhood residents' physical activity behaviors. We also developed a modified coding manual to include the new items which was used as the basis for training auditors before they began auditing street segments using GigaPan ® and GSV imagery. The training process for coding both types of imagery consisted of reviewing the coding manual as a group, reading the manual individually, experimenting with GSV and GigaPan ® technology, and then completing multiple practice audits with both GigaPan ® imagery and GSV imagery. Practice audits used locations in various urban cities of the USA that were outside of the study area. The locations were selected to help the auditors become familiar with coding features and amenities expected to be seen in Pittsburgh, Pennsylvania. During training, weekly meetings were held to discuss coding discrepancies within the group. Eighty percent agreement or higher was needed to be certified as an auditor, which was calculated between each auditor's coded answers and the lab manager.
After training was complete, coding of the 614 unique street segments for both GSV and GigaPan ® began. Two independent, trained auditors double-coded a random sample of the 614 street segments. One hundred twentyfour street segments served as the reliability sample for this study. The same street segment was coded by two different auditors for both GSV and GigaPan ® audits. All GSV imagery was the most current imagery available at the time audits were completed. GSV imagery dates ranged from July 2007 to September 2016, while Giga-Pan ® photos were taken from September 2015 to December 2015. The majority of GSV audits were completed using imagery from 2016.

Quality control
Prior to analysis, we evaluated the accuracy and quality of GigaPan ® and GSV imagery for each of the randomlyselected reliability segments. Eighteen segments were dropped from the analysis sample. Among the 18 segments, 13 were dropped for having GigaPan ® imagery or collection issues, one segment was dropped for having a GSV imagery or collection issue, and four segments were dropped for having both GigaPan ® and GSV imagery and/or collection issues. Dropped segments had at least one issue. The GSV imagery issues included the following: poor image quality, the incorrect cross street was used, the segment analyzed was not a street, the imagery was incomplete, or the street view imagery did not exist for the segment. The GigaPan ® imagery issues included the following: the imagery file could not be opened, the image only covered a small fraction of the segment, or the image was taken in the wrong location. The resulting reliability analysis sample included 106 segments with complete imagery for GigaPan ® and GSV.
In order to estimate the prevalence, features with three response options were recoded into two categories, with "1" indicating presence of the feature on one or both sides of the street and "0" indicating absence of the feature. The prevalence of each feature was then estimated. Features with a prevalence below 0.10 in either the GigaPan ® or GSV sample were excluded from the analysis given the poor performance of Cohen's kappa and the prevalenceadjusted bias-adjusted kappa (PABAK) when prevalence of a feature is extremely low [24][25][26]. After eliminating those features with a prevalence below 0.10, 28 features were retained for the analysis. We collapsed the 28 features into three concepts: land use, traffic and safety, and public amenities. Table 1 outlines each concept, the features comprising each concept, the question used to assess the feature, and the original response options. In general, land use includes features of residential buildings, nonresidential buildings, tree coverage, and segment slope. Traffic and safety includes sidewalk presence,

Statistical analysis
Average prevalence (PR) was calculated for each feature. In this study, this statistic represents the proportion of street segments containing the feature of interest.
Percent agreement (PA), Cohen's kappa, and PABAK were also calculated for each feature with valid GigaPan ® and GSV audit data. Although prevalence was estimated using the dichotomized versions of each of the features, the reliability statistics (PA, Cohen's kappa, and PABAK) were estimated using the original response options. The PA represents the total number of segments for which the two auditors selected the same response option (e.g., both auditors agreed the feature was present or both auditors agreed the feature was absent for the segment) divided by the total number of segments audited. This is then converted to a percentage to get the PA. The PA is a direct measure rather than an estimate; and therefore, confidence intervals (CI) are not needed [27]. The shortcoming of this statistic is that it does not account for the possibility that some agreement would be due to chance [27]. Thus, Cohen's kappa is a measure of agreement that corrects for agreement due to chance [28]. Although Cohen's kappa is the primary IRR statistic used in research, it can be affected by bias between auditors and the level of prevalence of the feature being observed [24]. Therefore, PABAK was also calculated. PABAK is a kappa statistic that accounts for prevalence and bias by holding prevalence constant at 0.50. Cohen's kappa and PABAK values are most similar when prevalence is close to 0.50. Additionally, PABAK is not recommended in instances of extremely low prevalence. However, by eliminating variables where the prevalence was below 0.10, we were able to overcome this limitation [26]. Cohen's kappa and PABAK are both estimated statistics with ranges from − 1 to + 1. A value of + 1 represents perfect agreement, a value of 0 represents agreement due to chance, and a value of − 1 represents perfect disagreement. The scale used for assessing the level of IRR based on Cohen's kappa and PABAK values is widely used in reliability literature and is as follows: poor (< 0.00), slight (0.00-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect (0.81-1.0) [29]. As noted by Landis and Koch, the divisions/ranges are arbitrary but provide useful benchmarks for discussion, and the nomenclature also provides consistency for describing the relative strength of agreement [29]. For each BE feature, CIs were also calculated for Cohen's kappa and PABAK given both statistics are estimates of the reliability based on the sample and the CIs provide a range of likely values for the estimate. The CIs were evaluated to determine differences between the two audit methods. Non-overlapping confidence intervals represented a statistically significant difference. All analyses were conducted using Stata version 14.2.

Results
Across the three BE concepts (land use, traffic and safety, and public amenities), some features had missing data and some features were only relevant if another feature was present. Five features were only relevant if another feature was present and thus followed skip logic to ensure they were not audited. For example, the audit tool question ascertaining the continuity of the sidewalk was only answered if the auditor indicated a sidewalk was present. If a sidewalk was not marked as present, the continuous sidewalk question was not answered. After taking into account missing data and skip logic, sample sizes ranged from 79 to 106.
Overall when using GSV, eleven of the 28 BE features had Cohen's kappa values in the substantial to almost perfect reliability range (the presence of: detached hous- ing, institutional land use, a stop sign, a sidewalk, a  marked crosswalk, a traffic light, a continuous sidewalk,  a continuous sidewalk (both ends), a bus stop, and a (Table 3). Presence of a traffic light, presence of a curb, and the number of traffic lanes had Cohen's kappa reliability values in the substantial range. Presence of a stop sign, presence of a sidewalk, and presence of a marked crosswalk fell into the almost perfect reliability range. Using PABAK, presence of a curb and the number of traffic lanes remained in the substantial range, while the presence of a traffic light was measured with almost perfect agreement. Moreover, the presence of street or sidewalk lighting, the presence of a continuous sidewalk, and the presence of a street and sidewalk buffer could be measured with substantial reliability according to PABAK. Presence of a stop sign, presence of a sidewalk, and presence of a marked crosswalk had almost perfect agreement for both Cohen's kappa and PABAK.
The assessments of the seven public amenities (presence of a bus stop, presence of a public trash can, the perceived safety of the segment, the overall condition of the  sidewalk, the presence of a garden, flower bed, or planter, the amount of trash on the street, and the attractiveness for walking) were similar for both reliability statistics ( Table 4). The presence of a bus stop was the only public amenity measured with substantial reliability. This feature was measured with almost perfect reliability after adjusting for the effects of bias and prevalence. In addition, using PABAK, the presence of a public trash can was measured with substantial reliability. All other public amenities were measured with slight, fair, or moderate reliability using Cohen's kappa and PABAK.

GSV
Presence of detached housing and the presence of institutional land use were measured with substantial agreement using GSV ( Table 2). Presence of housing remained in the substantial range, while the presence of institutional land use was measured with almost perfect reliability, using PABAK. Presence of attached housing, presence of bars on windows, street slope, and presence of apartments also had substantial agreement based on PABAK. Across all three BE concepts, the traffic and safety concept had the greatest number features in the substantial to almost perfect agreement range ( Table 3). Presence of a sidewalk, presence of a marked crosswalk, number of traffic lanes, presence of a continuous sidewalk, and presence of a continuous sidewalk (both ends) had Cohen's kappa values in the substantial range while presence of a stop sign and presence of a traffic light fell into the almost perfect range. The presence of a sidewalk, the presence of a marked crosswalk, and the number of traffic lanes were measured with almost perfect agreement using PABAK. The presence of missing curb cuts at a crossing was measured with substantial agreement while all other features measured with substantial to almost perfect agreement using PABAK mirrored those features falling within these respective ranges before adjustment (i.e., with Cohen's kappa).
Public amenities had the lowest reliability for both Cohen's kappa and PABAK (Table 4). Of the seven features assessed, presence of a public trash can was measured with substantial reliability and presence of a bus stop was measured with almost perfect reliability using Cohen's kappa. Both of these features were measured with almost perfect agreement using PABAK. All other public amenities had reliability values in the poor, slight, fair, or moderate reliability ranges.

Discussion
GigaPan ® and GSV BE features in the traffic and safety concept were most reliably coded, based on Cohen's kappa and PABAK, for both GSV and GigaPan ® . Features in the traffic and safety concept were assessed for presence only, not quality. The objective nature of these questions made them easier for auditors to code and may be the reason reliability is highest for these features. Furthermore, across the auditing tools there were four significant differences in the level of agreement for BE features related to curbs and sidewalk continuity in the traffic and safety concept. Agreement was higher for identifying the presence of a curb using GigaPan ® compared to GSV. However, there was lower agreement in identifying the presence of a missing curb cut at a crossing, continuous sidewalk, and continuous sidewalk (on both ends) using GigaPan ® compared to GSV. This difference potentially stems from where the GigaPan ® user positioned the camera on the street. Following the protocol, the user placed the camera on a tripod at the middle of the street segment. Although the camera rotates as images are captured, the camera does not traverse down the street as the GSV car does to capture imagery. Therefore, with GigaPan ® , magnification of the imagery at the edge of the frame is reduced and slightly distorted because the object is farther away from the camera lens. This slight distortion would make it difficult for an auditor to reliably audit features at the end of the sidewalk/street (e.g. curb cuts at a crossing) and the continuity of the sidewalk as the auditor codes features towards the edges of the GigaPan ® imagery frame (i.e. the ends of the sidewalk/street). In the future, a different lens could be used to reduce distortion and to increase the potential for these features to be coded reliably using GigaPan ® imagery and/or multiple photos could be taken for longer street segments.
In the public amenities concept, GigaPan ® performed significantly better than GSV across three features: the perceived safety of the segment, the overall condition of the sidewalk, and the amount of trash on the street. These are finely detailed BE features and are also more subjective. It is likely GigaPan ® performed more reliably across this concept because of the enhanced detail capabilities the tool offers. However, despite the significant differences between GigaPan ® and GSV across this concept, researchers should exercise caution in using GigaPan ® to audit public amenities with reliability values below the moderate to almost perfect range. Researchers may consider utilizing a camera with different zoom capabilities and lens options on the GigaPan ® apparatus to audit these BE features more reliably.
With regards to the land use concept, GigaPan ® and GSV performed similarly. Agreement was low across many features (presence of broken/boarded windows, presence of attached housing, presence of trees that shade the sidewalk, the amount of trees on the street, presence of bars on the windows, the slope of the segment, presence of a vacant building/lot, and the presence of housing apartments); however, after accounting for the effects of bias and prevalence, many land use features demonstrated substantial or almost perfect IRR regardless of whether GigaPan ® or GSV was used (the presence of institutional land use, the presence of attached housing, the slope of the segment, and the presence of housing apartments). The land use concept is a mixture of large-scale and finely detailed BE features. It includes large-scale features like the presence of trees and different types of housing, but also finely detailed features (e.g. the presence of broken windows and bars on windows) [6].
GSV performed similarly to previous studies using GSV audits to study the BE. In previous studies, GSV's greatest limitations were assessing finely detailed features and features that involved making qualitative judgments [6,7]. These limitations were also evident in our study across features like sidewalk quality and the presence of litter.
This is the first study to examine the IRR of assessing street segment BE features using GigaPan ® . The findings are consistent with the GigaPan ® IRR results found in our study of park BE features [17]. Overall, both studies found GigaPan ® to be a reliable method to assess the BE. The results can be explained in part by the high-definition, panoramic GigaPan ® images that are static and that our auditors perceived as easier to code. The enhanced detail in the GigaPan ® images may also provide researchers with the ability to more reliably code finely detailed BE features.
Improvements in the technology of videos and photo stitching mobile phone apps offer additional ways to study the BE, but even with these new forms of technology, GigaPan ® still has advantages. Videos are dynamic and provide new frames every millisecond, yet the time needed to code videos can be lengthy. Photo stitching apps for mobile phones provide users with the ability to take multiple photos and stich them together without special equipment; however, these apps do not consistently produce high quality panoramic images and may not meet the image quality standards required by researchers studying microscale features of the BE. As technology improves, these technologies may offer viable methods for assessing the BE. Overall, GigaPan ® and GSV performed similarly across the three BE concepts including features related to land use, traffic and safety, and public amenities. However, the three significant differences in the public amenities concept (the perceived safety of the segment, the overall condition of the sidewalk, and the amount of trash on the street) suggest GigaPan ® may be better suited to assess finely detailed BE features. Yet, even with the high-resolution imagery produced by GigaPan ® , these features only demonstrated fair reliability. Although the presence of litter reached moderate agreement in the GigaPan ® IRR study of park BE features, the findings between the two studies are overall consistent as finely detailed features (e.g. presence of overgrowth and condition of open green space) had slight-to-fair reliability in the park study [17]. Therefore, such features may require high-resolution imagery not currently available through GSV, but could be produced by using a camera with different zoom capabilities and lens options on the GigaPan ® apparatus. Additionally, although there was higher IRR using GSV imagery compared to GigaPan ® imagery for three features in the traffic and safety concept (i.e., presence of missing curb cuts at a crossing, presence of a continuous sidewalk, and presence of a continuous sidewalk (both ends), using a camera with different zoom capabilities and lens options may also allow auditors to more reliably code BE features at the edge of the photo frame when using GigaPan ® . The significantly higher IRR of identifying the presence of curb in the traffic and safety concept also suggests GigaPan ® imagery has benefits over GSV. Thus, the enhanced detail capabilities of GigaPan ® , coupled with the overall similarities in the reliability between GSV and GigaPan ® across the various BE concepts, position GigaPan ® as a potential, alternative tool to audit the BE.

PA, Cohen's kappa, and PABAK
In this study we used three measures of reliability: PA, Cohen's kappa, and PABAK. The PA does not account for the probability of agreeing by chance and therefore Cohen's kappa is the primary statistic used to rate reliability. However, Cohen's kappa values may appear low when agreement is high because of very low or high prevalence or because of bias. In this study, PABAK was calculated to account for these factors. Across GigaPan ® and GSV, PABAK values were consistently higher than Cohen's kappa values for all but three BE features. However, PABAK should not be interpreted as measuring the same agreement as Cohen's kappa-PABAK ignores the variation of prevalence across the BE features examined and assumes the absence of bias. In addition, PABAK has not been as thoroughly studied as Cohen's kappa. Therefore, in congruence with previous researchers, we recommend using PABAK in addition to Cohen's kappa to give a more complete picture of the data [30].

Limitations
Both GSV and GigaPan ® had imagery issues in our study. For example, our analytic sample was reduced because of GSV issues related to poor image quality, incomplete imagery, or non-existent street view imagery for the segment. Given the auditor is not able to capture or recapture GSV imagery, the BE could not be measured in these cases. These limitations are consistent with findings from a systematic review of GSV studies of neighborhood environments in North America, Europe, New Zealand, Australia, Japan, and Brazil [12]. Uniquely, the GSV imagery was updated for one segment during the audit process in our study. Furthermore, given the GSV imagery dates ranged from 2007 to 2016 and the GigaPan ® imagery was from 2015, there is also temporal mismatch across methods for some segments. However, regardless of temporal alignment, the IRR results for GigaPan ® and GSV are independent of each other because IRR is calculated within each method (GigaPan ® or GSV). In other words, the IRR results of GigaPan ® do not depend on the GSV imagery and vice versa.
Although GigaPan ® has the potential to address some of these GSV imagery issues, it also has limitations. Issues unique to GigaPan ® include: the image only captured a small fraction of the street segment, GigaPan ® Stitch Software could not open some image files, incorrect images were taken by the field staff, and the images taken of both sides of the street did not correspond to each other. More segments were dropped from our sample due to GigaPan ® issues than GSV issues, and the majority of the issues associated with GigaPan ® were user error issues. This difference suggests there is greater potential for problems with obtaining GigaPan ® imagery compared to GSV imagery.
When capturing the GigaPan ® images or GSV imagery on segments where street parking existed, cars often blocked the point of view of the camera. This makes it challenging to see some features of the street and is consistently an issue in GSV studies internationally [12]. On the other hand, the user of GigaPan ® maintains the ability to position the camera to strategically avoid or limit the interference of parked cars or other obstacles. Therefore, future research teams utilizing GigaPan ® technologies should incorporate this solution into their GigaPan ® procedures.
Another factor to consider when using GigaPan ® is the quality of the camera. Although higher quality cameras increase supply costs for the project, higher quality