Using ArcMap, Google Earth, and Global Positioning Systems to select and locate random households in rural Haiti

Background A remote sensing technique was developed which combines a Geographic Information System (GIS); Google Earth, and Microsoft Excel to identify home locations for a random sample of households in rural Haiti. The method was used to select homes for ethnographic and water quality research in a region of rural Haiti located within 9 km of a local hospital and source of health education in Deschapelles, Haiti. The technique does not require access to governmental records or ground based surveys to collect household location data and can be performed in a rapid, cost-effective manner. Methods The random selection of households and the location of these households during field surveys were accomplished using GIS, Google Earth, Microsoft Excel, and handheld Garmin GPSmap 76CSx GPS units. Homes were identified and mapped in Google Earth, exported to ArcMap 10.0, and a random list of homes was generated using Microsoft Excel which was then loaded onto handheld GPS units for field location. The development and use of a remote sensing method was essential to the selection and location of random households. Results A total of 537 homes initially were mapped and a randomized subset of 96 was identified as potential survey locations. Over 96% of the homes mapped using Google Earth imagery were correctly identified as occupied dwellings. Only 3.6% of the occupants of mapped homes visited declined to be interviewed. 16.4% of the homes visited were not occupied at the time of the visit due to work away from the home or market days. A total of 55 households were located using this method during the 10 days of fieldwork in May and June of 2012. Conclusions The method used to generate and field locate random homes for surveys and water sampling was an effective means of selecting random households in a rural environment lacking geolocation infrastructure. The success rate for locating households using a handheld GPS was excellent and only rarely was local knowledge required to identify and locate households. This method provides an important technique that can be applied to other developing countries where a randomized study design is needed but infrastructure is lacking to implement more traditional participant selection methods.


Background
Use of a geographic information system (GIS), a system for input, storage, manipulation, and output of geographic information provides a powerful tool for public health assessment and monitoring in remote locations and developing countries [1]. GIS-based approaches have been used to study infectious diseases like malaria in Africa [2] and dengue virus in different parts of the world [3,4]. Effective use of GIS in public health assessment typically requires basic infrastructure data layers for georeferencing data (addresses, zip codes, streets, city blocks, location of health facilities, etc.). In rural Haiti, geospatial infrastructure often is lacking, making it difficult to implement GIS-based household water quality sampling, ethnographic surveys, and randomized study designs. Although satellite imagery and aerial photos have been available for many years, their use was limited due to cost and availability, particularly for public health officials in developing countries [5,6]. The advent of internet based mapping technologies such as Google Earth ™ and Google Maps ™ provide free satellite imagery, aerial photos, and topographic data for most of Earth's land surface, resulting in increased availability of mapping technology for use by public health workers and researchers [7][8][9]. However, Google Earth ™ still lacks the robust map manipulation and analysis functions of GIS software [9], and often requires data export/import to a GIS program such as ESRI's ArcMAP 10.0 (ESRI, Redlands, CA) for further analysis and map preparation. Recently, public health assessment programs have effectively combined GIS and Google Earth ™ to collect data for dengue fever [10], schistosomiasis [11], and mortality [12]. This article describes a method that utilizes GIS; Google Earth ™ (Google Inc., Mountain View, CA), and Microsoft Microsoft Excel ™ 2010 (Microsoft Corp., Redmond, WA) to map and select random households for ethnographic surveys and water sampling in rural Haiti. The technique is applicable to other developing countries, does not require access to address records or ground-based surveys to collect household location data, can be prepared prior to field work with common software packages, and requires only a handheld GPS to accomplish accurate field location of selected households.
Random sampling for ethnographic and public health surveys provides a representative sample of target populations, and ultimately representative data from these populations can be collected [13]. In cases where the sample population is inherently stratified or grouped, stratified random sampling can be employed [14]. For example, populations can be stratified by age, gender, or geographic location. Random samples, subjects, or households, are then selected from each strata or group. Stratification for this study was geographically-based. Clusters or groups of homes were located within circular polygon regions (strata) variable distances (1-9 km) from Hôpital Albert Schweitzer (HAS) [15], a local source of health care and community development support. Homes were randomly selected from each cluster for ethnographic surveys and water sampling.

Study area
The study site was located near HAS in Deschapelles, Haiti, roughly 65 kilometers north northwest of Port au Prince, Haiti (see Figure 1). Haiti is a mountainous country of rough terrain, sparse road infrastructure, and few major rivers. The Artibonite River, the largest river in Haiti, flows below the study area and is served by a paved highway which connects Port au Prince with the Central Plateau. The Artibonite River valley was the location of the initial cholera outbreak in 2010 [16]. There are no paved roads in the mountains above Deschapelles and streets and roads are often not named or labeled. Many roads become impassable by vehicles after precipitation events due to river crossings, steep terrain, and lack of traction on slick limestone surfaces. Homes in rural Haiti are typically constructed of rock and wood, and are arranged in loose aggregates or villages near water sources, roads, or along ridge tops. Many of the homes are only accessible by foot path. Community and village names often have multiple spellings and pronunciations depending on the context and source. Village households sometimes have addresses written on door posts; however addresses are not related to a regular grid or street and may not be recorded or documented in a form that is accessible for participant selection and implementation of a random survey.

Results and discussion
The list of homes and mapping were used to create laminated field sheets to be used in concert with handheld GPS units to locate homes for interviews and water sampling. A total of 55 homes were visited during the 10 days of fieldwork. Over 96% of the homes mapped using Google Earth imagery were correctly identified as occupied dwellings (see Table 1). Only 3.6% of the occupants of mapped homes visited declined to be interviewed. 16.4% of the homes selected were not occupied at the time of the visit. Occupancy was especially problematic on market days when many residents travel to nearby cities to sell and buy goods. The handheld GPS units proved very effective at field location of homes with virtually 100% successful location of pre-selected structures.
Factors that contributed to unsuccessful home identification, home location, and occupancy at the time of our visit included: 1) lack of homeowner availability on market days and holidays; 2) house construction methods in Haiti; and 3) road and trail access. Market days in much of Haiti take place two days a week in a given village and often occur on different days in other villages within walking distance. This results in an exodus of residents from their homes in certain rural villages on market days. Since some of these villages are remote and difficult to access, it is prudent to coordinate visitation of these locations to avoid market days and holidays. It is common practice in Haiti for homes to be "under construction" for years to decades. This may result in a roof being put in place for many years prior to occupancy of a dwelling. Aerial mapping of homes was unable to distinguish between roofs of homes under construction and occupied dwellings, resulting in at least two cases in the mapping of homes which were not occupied. In some cases, dwellings that appeared to be close together were in reality difficult to access due to a lack of roads or trails which were not evident from aerial mapping.    Due to the variable nature of the size and configuration of houses, schools, and churches in Haiti, it was not feasible to distinguish them from individual homes using aerial imagery. Consequently it was important to identify a larger number of homes from which the random sample could be selected. Our drivers, interpreter, and local residents were able to provide valuable local knowledge in these cases to locate expedient access routes and facilitate sampling in a timely fashion.

Conclusions
The method used to generate and field locate random homes for surveys and water sampling was an effective means of selecting random households in a rural environment lacking geolocation infrastructure. The success rate for locating households using a handheld GPS was excellent and only occasionally was local knowledge required to identify and locate households. The use of this method avoided the potential bias which could be introduced if an interpreter or driver were used to locate homes for survey or sampling. This method was fast, relatively straightforward and required only limited access to GIS and computer technology to setup the field survey. Once the lists and GPS coordinates were generated and loaded to field units, no internet or cell phone data access was needed to accomplish the survey. This method provides an important technique that can be applied to other developing countries where a randomized study design is needed but the necessary geolocation infrastructure is lacking to implement a more traditional randomized study design.

Methods
Home clusters were chosen based on distance from HAS. One of the goals of the ethnographic survey was to evaluate how proximity to this important source of health care, community development services, and water quality interventions influenced water quality perceptions and practices. Cluster selection was restricted to two geographic areas, similar in size to counties in the United States, called Belanger and Bastien. Cluster locations were all within 9 kilometers of HAS and were chosen based on 1) household density; 2) road access; and 3) field data collection feasibility. Clusters had to be located so that water quality bacterial analysis could be completed within the required holding time of 6 hours to maintain E. coli viability [17]. Occasionally sites were revisited on the way to or from another location. This was made possible by having the sites located in the same direction from the hospital. We received Grand Valley State University Institutional Review Board (IRB) approval for the water sampling and ethnographic surveys (IRB# 10-256-H) associated with this study. The random selection of households and the location of these households during field surveys were accomplished using GIS, Google Earth ™ , Microsoft Excel ™ , and handheld Garmin GPSmap 76CSx GPS units. Since there were little or no written documents available from the Haitian government, with respect to census data and residential addresses, the development of a remote selection method was essential to the success of the study. The method could be successfully applied in other developing countries where geographic infrastructure is lacking.

ArcMAP buffers and household mapping
A Multi-band Landsat Enhanced Thematic Mapper (ETM+) satellite image, captured August 21, 2000 and obtained from the EarthExplorer web site, was used as a basemap to create geographically stratified circular regions [18]. The radius of each circular region was increased by 1 kilometer from HAS using the ArcMAP 10 BUFFER command to create a set of nested circular regions ( Figure 2). The circular polygon regions were then exported as a KML file into Google Earth ™ . This allowed for the use of recent high resolution Google Earth ™ imagery to identify clusters of households and map individual dwellings within each cluster for later random selection and sampling. Google Earth aerial photos were taken in late 2010, had excellent resolution (~1-3 m), and was sufficiently detailed to map individual homes with very good accuracy.
In order to facilitate timely field sampling and surveys, clusters of homes within each of nine circular buffer regions were selected and individual homes were mapped using pushpins in Google Earth (total of 537 houses) (see Figure 3). Areas of high home density were chosen to facilitate timely interviews and access within the limited time available for surveys and subsequent water analysis. Approximately 50 homes were mapped in each cluster from which a subset was chosen randomly for the ethnographic surveys.
The point layer of homes created in Google Earth was imported into ArcMap and attributes were added for latitude and longitude (WGS 84 coordinate system). These geographic coordinates were later uploaded into a handheld GPS and used to locate homes in the field for conducting field surveys.

Random household selection and field location
The entire attribute table from ArcMap was exported to Microsoft Excel and a column was added for a unique identification number (for example H-1-1) which would be used for coding ethnographic surveys and water samples (see Table 2). A column also was added so that each household could be assigned a random number between 0 and 1 using the RAND() function in Microsoft Excel. The sorting function then was used to perform a multi-level sort of all the households first by cluster then by random number. The end result was a list of mapped households in random order for each cluster. For the purposes of the study, the first twelve households randomly selected from each cluster were designated for surveys and water analysis. The entire process of home identification, mapping, field map production, and random list generation took approximately 8-10 hours. WGS84 latitude and longitude coordinates were uploaded to a Garmin GPSmap 76CSx. Each of the first 12 random homes was located using the GPS until 5-9 surveys were completed in a given cluster. If no one was home at a selected household, or the homeowners were unavailable, the next home on the random list was selected.