Both feature generation events (i.e., mapathon and AFE) studied here used the same satellite imagery. The study area comprises two districts in Central Asia that were inaccessible to EI at the time of the study. To protect the security of populations living in our study region, the specific geographic areas will not be disclosed. The varied terrain of the urban and rural study areas included rocky and forested mountainous regions, low plateau areas with desert terrain, and some fertile plains used for farmland. The climate in the study area is arid to semiarid with low rainfall in most areas of the region. Permanent and temporary housing structures were visible in the satellite imagery used for the study and included small mud free-standing structures, larger mud-brick and stone compounds surrounded by walls, modern free-standing structures and housing complexes in urban areas, small cliff dwellings, and temporary tents or yurts.
Mapathon
We conducted the mapathon for this project under the guidance of the WHO and the Geospatial Research, Analysis, and Services Program (GRASP) at the Centers for Disease Control and Prevention (CDC). The mapathon coordinators created a dedicated ArcGIS Online (ESRI, Redlands, CA) hub page for enrolling and training both novice and experienced mappers. This mapathon resource repository included registration information, tutorials on digitizing and application use, a real-time monitoring dashboard to analyze participant progress, links to communication channels, and sections on frequently asked questions. Mapathon participants from the CDC and WHO were recruited via emails and hardcopy informational posters. Participants logged into an ArcGIS Online web application with basic data editing functionality to view current, high-resolution (0.3–0.5 m) satellite imagery downloaded from DigitalGlobe (DG) with the goal of identifying structures inside the two study districts.
The imagery for both districts covered a total area of 6146 square kilometers. The entire area of interest was divided into 1 km × 1 km grid cells for the contributors who then digitized features of interest within one cell at a time. For each cell, contributors placed one spatially linked point on the center of any man-made structure that was larger than 9 m2 in the image (Fig. 1). If multiple structures existed within a compound (i.e. several structures surrounded by a common wall), the contributors digitized any eligible structure within the compound rather than counting the compound grouping of structures as just one point. Digitized features could be structures used for any purpose. Additionally, contributors were instructed to place points on structures that seemed to be under construction, regardless of shape, while avoiding those that appeared to be destroyed. Structures partially within the grid were treated as within the grid and were digitized. When a contributor marked a cell as complete, all man-made structures larger than 9 m2 and visible in the imagery should have been digitized as point feature class data (a discrete location represented by longitude and latitude coordinates). GIS experts served as validators within the mapathon coordination team, using a separate ArcGIS Online web application to validate any cells marked as complete by the contributors. Validators did not evaluate the quality of digitized points submitted by each contributor but ensured that all features of interest in the underlying satellite image were correctly digitized by contributors and made edits as needed before finalizing each cell. Because the mapathon contributors and validators had little-to-no knowledge about the setting, they did not make any classifications regarding the current use of the buildings they digitized.
Automated feature extraction
The alternative method of acquiring spatial data for this project leveraged semi-automated feature extraction (Fig. 2) using the results from machine-learning deployments on millions of structures across various developing countries. The results gathered from previously conducted deployments supported Ecopia Tech’s (Toronto, ON, Canada) proprietary machine-learning models in generating building footprints for structures of interest detected in the imagery.
To support the models in extracting building footprints, relevant imagery was broken into a grid of 256 × 256-pixel chips. Within each chip, a classifier ran through every pixel and assigned each one a probability score for containing a feature of interest, using a variety of textural feature data from neighboring pixels in its calculations. The classifier algorithm used is a proprietary algorithm developed by Ecopia which measures the shearing of pixels along with color gradients to determine the likelihood that a structure falls within a pixel. Shearing in straight and/or circular lines can be indicative of man-made materials. If sheering, texture and contrast scores exceeded Ecopia’s internal threshold of 1 then the pixels were classified as likely containing or being a part of a structure. The classifier algorithm then digitized each structure’s boundary, using the confidence scores previously generated for each pixel. Any chips that did not contain structures were removed from the algorithm’s output. A team of former geospatial professionals and remote sensing enthusiasts who are expert annotators then reviewed the resulting data sets, manually corrected any errors, and provided any necessary updates to the classifier algorithm. Using a “CrowdRank” algorithm [32] we were able to classify users who perform better when compared against other users completing the same task. Users who regularly fall below a pre-defined benchmark are removed from the project in an iterative fashion to promote the highest accuracy possible. Informed by the updated and improved data, the classifier algorithm then iteratively reproduced the process to increase overall accuracy. During these iterations, the annotators continued to manually revise any incorrectly generated vector edges and updated the classifier algorithm accordingly. Furthermore, the annotators manually digitized obscured structures to reflect accurate footprints of structures of interest. Prior to this deployment, Ecopia Tech developed an AFE algorithm capable of classifying footprints and partnered with Maxar Technologies (Westminster, CO) to utilize their very high-resolution imagery mosaics and guidance on categorizing the building footprint outputs. To accurately categorize footprints as commercial, compound, or residential, expert imagery analysts from Ecopia manually identified examples of each from the imagery to use in training data for the machine-learning model. Guided by discussions with local consultants, Ecopia defined a compound as typically including several structures along with a yard surrounded by a wall. Non-walled, free-standing structures were then categorized as either commercial or residential depending on other contextual factors, such as the proximity and presence of latrines, farmlands, vehicles, and other indicators of human activity (Fig. 3). The AFE algorithm excluded structures that were round and smaller than 9 m2 to ensure boulders were not mistaken for structures. The outputs of the model were polygons drawn around the perimeter of all free-standing structures larger than 9 m2 (whether categorized as commercial or residential) and all residential compounds (Fig. 1). Unlike the mapathon, compounds, regardless of the number of structures contained inside, were treated as one polygon feature.
An estimate of the population inhabiting the structures captured by each method was calculated employing the assumption that a compound includes three households, where each household has an average of 2 adults and 5 children. Therefore, it was estimated that each compound housed an average of 21 individuals.
Accuracy assessment
The study areas were selected for two reasons: their geographic heterogeneity despite the low spectral diversity (e.g., deserts, arid mountainous, alluvial plains, etc.) and the inaccessibility of local ground-truth data due to continuous insecurity.
We employed the following accuracy assessment techniques to determine how well each feature generation method—mapathon or AFE—captured the actual location of features of interest.
Assessment 1
We conducted an agreement analysis of the two feature classes to assess matches across the two outputs. To ensure a uniform comparison across both sets of features, we only considered the non-compound AFE features and the mapathon points that were not part of a compound. A simple ‘select by location’ query was used within ArcGIS Pro, whereby both feature types, point and polygon, were analyzed together.
To allow for small shifts in geographic location when comparing mapathon points and AFE polygons, features within 5 m of another polygon’s perimeter were considered a match and labeled as “befriended” (Fig. 4). If a point from the mapathon did not fall within an AFE polygon or have an AFE polygon within 5 m of it, we labeled that point as “lonely”. Similarly, if a polygon from AFE did not have a corresponding mapathon point within 5 m of the polygon’s edge, we labeled that polygon as “lonely”. Five meter buffers were applied to points and polygons separately, rather than simultaneously, such that the consideration of a potential 5 m shift in the location of the polygon or the point was analyzed first for one of the feature types and then the other.
Assessment 2
We conducted a subset analysis of the data from both feature generation methods. Two GIS experts who were not part of the mapathon validation team, independently analyzed the same set of 100 random, lonely points and 100 random, lonely polygons against the same high-resolution imagery. The subset analysis was limited to lonely points and lonely polygons as lonely features were not considered to be matches from Assessment 1. The GIS experts did not have access to ground truth due to security reasons in the study region. As an alternative, high-resolution satellite imagery was used as the source of verification for their assessment. They classified points and polygons correctly corresponding to a structure as true positives (TP) based on verification against satellite imagery and classified the remaining features as false positives (FP), also based on verification against satellite imagery. Finally, the true positive percentage was calculated by averaging the number of TP and FP yielded by the two GIS experts.
Assessment 3
The third accuracy assessment involved statistically comparing the features generated from both the mapathon and AFE to a microplan, considered the gold-standard data, shared by the local-level team from one of the study districts. The microplan was developed by the local health authorities and is created by listing out the known settlements in the areas targeted for vaccination and estimating the number of households that vaccination teams should expect to find in each settlement. The study district consisted of 44 operational sub-districts called clusters and one vaccination team was assigned to work in each cluster. The microplan included cluster names, number of households per cluster, number of vaccination teams, the population aged 0–59 months (i.e., target age for vaccination), and total population.
To account for differences in feature extraction techniques and parameters, we analyzed residential or compound polygons and mapathon points. Because mapathon points captured structures of any use while the AFE and microplan indicated household structures of residential use only, the count of mapathon points per cluster was recalculated to better approximate the number of households in the cluster. To count only one mapathon point per compound, we first removed mapathon points that fell inside of AFE polygons categorized as compounds from the analysis. We then multiplied the percentage of AFE compounds containing at least one mapathon point (83%) with the number of compounds in each of the 44 clusters and added that number to the mapathon points for each cluster, thus creating a more comparable dataset to the AFE polygons and microplan.
The null hypothesis for this assessment assumed no significant differences between the average number of features per cluster in the microplan in comparison with the average number of features per cluster obtained from each feature generation method. To test this assumption, we conducted 2-sample t-tests: (a) comparing the average number of points per cluster from the mapathon to the microplan and (b) comparing the average number of polygons per cluster from AFE to the microplan. T-statistics indicated whether there were significant differences (p < 0.05) between each feature generation method and the microplan.
Assessment 4
Population in the study area was estimated by applying the following assumptions to the mapathon and AFE data—A compound consists of 3 households and a household consists of 7 individuals. This assumption was based on advice from in-country colleagues. Therefore, population estimates were calculated based on the number of free-standing (7 individuals) residences and the number of compound residences (21 individuals).