Gridded population survey sampling: A review of the field and strategic research agenda

Introduction: In lowand middle-income countries (LMICs), household survey data are a main source of information for planning, evaluation, and decision-making. Standard surveys are based on censuses, however, for many LMICs it has been more than ten years since their last census and they face high urban growth rates. Over the last decade, survey designers have begun to use modelled gridded population estimates as sample frames. We summarize the state of the emerging field of gridded population survey sampling, focussing on LMICs. Methods: We performed a systematic review and identified 43 national and sub-national gridded population-based household surveys implemented across 29 LMICs. Results: Gridded population surveys used automated and manual approaches to derive clusters from WorldPop and LandScan gridded population estimates. After sampling, some survey teams interviewed all households in each cluster or segment, and others sampled households from larger clusters. Tools to select gridded population survey clusters include the GridSample R package, Geosampling tool, and GridSample.org. In the field, gridded population surveys generally relied on geographically accurate maps based on satellite imagery or OpenStreetMap, and a tablet or GPS technology for navigation. Conclusions: For gridded population survey sampling to be adopted more widely, several strategic questions need answering regarding cell-level accuracy and uncertainty of gridded population estimates, the methods used to group/split cells into sample frame units, design effects of new sample designs, and feasibility of tools and methods to implement surveys across diverse settings.


BACKGROUND
Household surveys provide insight into the distribution of health, demographics, economics, and behaviours of populations, and are a primary resource for decision-making across low-and middleincome countries (LMICs). Household survey data are used to estimate more than a quarter of the Sustainable Development Goal (SDG) indicators, to generate small area estimates (SAEs) of indicators that support decision-making in decentralized health systems [1], and to inform the distribution of development funding to, and within, LMICs. Nevertheless, as the use of household surveys has increased over the last 40 years, data accuracy has likely decayed because survey methods have not changed while population characteristics and behaviours have -drastically.
Survey sampling methods have been mature for decades. The Demographic and Health Surveys (DHS) [2], Multiple Indicator Cluster Surveys (MICS) [3], and Living Standards Measurement Surveys (LSMS) [4] have collectively supported hundreds of multi-topic surveys in over 130 countries since 1980 using essentially the same methods. They follow a stratified two-stage cluster design in which first-or second-level administrative units (e.g. provinces) serve as strata. In stage one, census enumeration areas (EAs) are selected with probability proportionate to population size (PPS), and then a field-based mapping-listing activity is conducted in each selected cluster to fully list all households. In stage two, households are sampled from the full listing by an impartial central team, and interviewers return to selected households to administer questionnaires. Rapid needs assessments and public opinion surveys follow a similar design, but tend to use a faster, lessrigorous household selection protocol during stage two; rather than performing a full household listing, interviewers perform a random walk from a central point in the cluster and directly sample households in the field [5,6]. This approach is considered less rigorous than a full listing because interviewers may consciously or sub-consciously avoid undesirable households, the protocol can result in a "main street" bias, and information needed to adjust for household sample probabilities and non-response are, generally, not collected [7].
The last 40 years have seen dramatic increases in mobility of LMIC populations, urbanisation, and socioeconomic disparities within cities [8]. The urban poorest include climate and political refugees, seasonal migrants, and rural migrants, as well as multi-generation slum dwellers, street-sleepers, and marginalized minorities [8]. Concurrently, availability of technologies (e.g., mobile phones) and new data (e.g., high-resolution satellite imagery) has rapidly increased, though few new technologies and datasets have been incorporated into standard survey practice. This mismatch has resulted in challenges to sample frame and field protocol accuracy [9,10]. Furthermore, the SDGs have increased emphasis on disaggregated indicators [11], raising concerns about whether current survey designs are ideal for accurate SAEs, which we highlight below. To address these emerging issues, survey practitioners have begun to use modelled gridded population datasets as an alternative to census sample frames.
Gridded population datasets are estimates of the total population in small grid cells derived with a geo-statistical model using census or small area population counts and a number of other spatial datasets [12]. The cells in gridded population estimates range in size from 30x30m to 1x1km, and many of these datasets are free and publicly available. In gridded population sampling, grid cells are often aggregated into clusters of a desired population size, and used in place of census EAs. To contextualise gridded population sampling, we provide further background on key reasons that teams have turned to gridded population sampling, and provide an overview of gridded population datasets.
The rest of this paper provides a systematic review of the datasets, tools, and methods used in existing gridded population surveys in LMICs, and a research agenda that would equip survey designers to decide when gridded population sampling can be viable and preferable to census-based sampling. We aim to encourage new research and practices that improve the accuracy of survey data and, ultimately, to improve targeting of resources toward mobile and vulnerable populations.

Reasons for use of gridded population sampling
The main reason that survey practitioners have turned to gridded population sampling is lack of a current, accurate census sample frame. One in four LMICs has not had a census in the last 10 years [13]. High rates of urban growth and mobility in LMICs mean that megacities in Asia, and soon Africa, grow by 1,000 people per day [14]. Since 2000, the average household survey sample frame in LMICs was seven years old, with some surveys using 15 (Pakistan) and 30 (DR Congo) year old sample frames [15]. Vulnerable populations are most likely to be excluded from surveys with an outdated sample frame because population growth is greater among lower-income households, and they are more likely to be undercounted in censuses [10].
The second reason for choosing gridded population sampling is that standard survey methods, largely developed for rural settings 40 years ago, struggle to sample mobile and vulnerable households accurately [16]. A time gap between the household mapping-listing activity and interviews in DHS, MICS, LSMS, and similar surveys means that mobile and vulnerable households are more likely to be counted as non-responders or to be under-listed. Furthermore, the mapperslisters who are responsible for generating the final household sample in the DHS, MICS, LSMS, and similar surveys frame have short interactions (e.g. 5-15 minutes) with residents. With limited rapport, residents may be unwilling to describe informal households in the dwelling (living space, e.g. apartment), and/or the mapping-listing team assumes one household occupies each dwelling which is simply not the case in modern LMIC cities [15,16]. In LMICs that do not have geocoded census EA boundaries, mapping-listing activities rely on hand-sketched paper maps and subjective descriptions of EA boundaries by local leaders, leading to further potential biases.
A third reason for choosing gridded population sampling is to produce improved small area estimates. In recent years, funders and decision-makers have pushed for important outcomes to be measured at smaller administrative scales (e.g., district) for policy planning and evaluation [1,11]. Increased availability of satellite imagery has enabled survey outcomes to be modelled at fine-scale using geostatistical SAE techniques [17]. However, SAEs based on the stratified two-stage PPS design tend to have large uncertainty in sparsely-sampled rural areas and in heterogeneous urban settings [17]. Gridded population estimates can provide more up-to-date and detailed population counts than outdated census frames, permit new survey designs such as area-microcensus sampling to eliminate the time lag between mapping-listing and interviews, and facilitate spatial oversampling to improve survey-based SAEs.

Gridded population data
A number of gridded population datasets are available across LMICs (Table 1). "Top-down" datasets disaggregate census counts to grid cells, while "bottom-up" estimates are based on micro-census population counts [18]. Currently, nine sources of "top-down" estimates are available for multiple LMICs, and two sources of "bottom-up" estimates are in production for multiple LMICs [12].
Top-down gridded population estimates. Nearly all gridded population datasets available at the time of this writing were derived from "top-down" models which disaggregate census or other fullcoverage population counts into small grid cells. These models produce "pycnophylactic" estimates such that the cell-level counts re-aggregate to the counts of input administrative data [19]. Generally input population counts are adjusted to UN population projections before modelling [20], however this still means that countries with the greatest need for improved sample frames have the least accurate top-down gridded population datasets. Additional factors influence the accuracy of topdown modelled population estimates, namely the aggregation scale of the input census data, modelling approach, and area of the output grid cell.
Scale of input data. The most important factor for top-down gridded population accuracy is the aggregation scale of the model input population data (e.g., census) [21]. This is intuitive; the more detailed and accurate the input dataset, the more precise and certain the output estimates will be in small grid squares.
Modelling approach. The simplest top-down models assume that the population is spread evenly across grid cells within administrative units (e.g. GPWv4 [22,23]) or are weighted by land cover types (GHS-POP [24,25]; HRSL [26]; ESRI WPE [27]; WorldPop-Land Cover [28,29]). These modelling techniques are more mechanical than statistical, and thus do not result in estimates of model error. These models produce reasonably accurate cell-level estimates if a highly accurate dataset of builtup areas is used to mask unpopulated areas, and the input population data is both disaggregated and recent [21], all of which are rare in LMICs.
 WorldPop-Random Forest and WorldPop-Global are free, publicly available 100x100m datasets of the residential (night-time) population based on a regression tree machinelearning method, and are accompanied by prediction errors [30]. Error estimates are derived at the geographic scale of the input population data by reserving a subset of the input data for comparison against the model output. Neither WorldPop-Random Forest nor WorldPop-Global datasets mask out built-up areas, thus they produce small, non-zero population predictions in deserts, savannahs, and forests (e.g., 0.0001 persons per cell). Differences between these two datasets are that WorldPop-Random Forest is available for every fifth year in 69 LMICs based on all available spatial covariates, while WorldPop-Global has annual estimates for all countries, incorporates changes to urban extents over time, was more recently updated for most countries, and is modelled from a reduced set of covariates that are available globally.  Demobase is a free, publicly available 100x100m dataset of the residential (night-time) population in three countries based on semi-automated classification of high-and mediumresolution satellite imagery, with prediction errors at the scale of the input population data [33].  LandScan-Global is an annual 1x1km dataset of the "ambient" population; a 24-hour average of daytime commuter population and night-time residential population [32]. Neither the source data nor the model code are released publicly, and most users pay a fee to access the data. This dataset is derived with a smart interpolation approach and model error estimates are not provided [32].
A common issue across all top-down gridded population datasets is that they allocate population to areas that show human activity according to satellite imagery and GIS datasets. This means that population estimates are sometimes allocated at airports, universities, factories, and government buildings, effecting cell-level accuracy in urban areas. Misallocation of population in gridded population models may be reduced by including covariates associated with variation in urban density (e.g. building footprints), and/or covariates that represent points of interest and infrastructure where people tend not to live. Area of output grid cells. The geographic size of the output cells influences estimated population accuracy at the cell-level. Generally, estimates in smaller cells have greater uncertainty, and accuracy improves with cell size. For household survey sampling, however, cell-level accuracy must be balanced against feasibility of cell size for fieldwork; in dense urban contexts, a 100x100m grid cell might contain 1000s of people. Gridded population datasets with small cells are easy to aggregate into larger units, however, complex methods are required by users to disaggregate cells that are too populous for survey field work into smaller units [34]. WorldPop-Random Forest and WorldPop-Global offer substantial flexibility in terms of small cell size, high model accuracy [30], and full coverage in LMICs resulting in their use in numerous surveys [16,[35][36][37][38]. The older LandScan-Global dataset was used in a number of early gridded population surveys [39][40][41][42].
Bottom-up gridded population datasets. To generate gridded population estimates in countries without a recent or accurate census, "bottom-up" models are currently under development to estimate population counts based on recent micro-census samples rather than full censuses [18]. These models draw on geo-statistical relationships between population density in a micro-census unit and settlement type, as well as other spatial covariates to predict population counts in unsampled areas of the country. These census-independent gridded population estimates are soon expected for multiple LMICs from the GRID3 and LandScan-HD projects, and will have the benefit of being constrained to settled areas [43,44]. Other projects have resulted in a bottom-up gridded population estimate for a single country (e.g. Sierra Leone [45], Afghanistan [46]), and an early version of GRID3's Nigeria dataset is currently available [47].
Gridded population sample frame attributes. Gridded population datasets are not provided with urban/rural classes, administrative unit names, or estimates of sub-populations because they are designed to be aggregated into any desired spatial unit. Publicly available datasets can be used to classify a gridded population dataset within a geographic information system (GIS) (e.g., ArcGIS, QGIS) or statistical program (e.g., R, Python). Urban/rural datasets include the Global Urban Footprint (GUF) [48] dataset of 85x85m grid cells classified as built-up or not built-up, and the Global Human Settlement GHS-SMOD [24] dataset of 1x1km grid cells classified as high-dense urban, lowdense urban, rural, and unsettled based on the GHS-POP population density and GHS-BUILT-UP datasets. Administrative boundaries are available as shapefiles through a number of initiatives including GADM [49], UN-SALB [50], and MapLibrary [51].

METHODS
We conducted a systematic review in Scopus using the terms: ("gridded" OR "landscan" OR "worldpop" OR "gpw" OR "ghs-pop" OR "hrsl" OR "wpe" OR "demobase") AND ("population" OR "household") AND "survey". Article abstracts were independently screened by co-authors DRT and DAR and retained if they referred to sampling of human populations. We additionally solicited reports, websites, and articles from colleagues. DRT performed a full-text review of all screened articles and reports, and retained those that described a method, tool, or survey based on gridded population data. Retained publications were reviewed for sample frame, sample design, tools, and protocols used. A strategic gridded population survey research agenda was iteratively developed among co-authors with feedback from survey experts in a two-day workshop and via email.
One survey compared area-microcensus and two-stage gridded population sampling in Kathmandu, Nepal, and found that when interviewers (area-microcensus) rather than the mapper-listers (twostage) performed the household listing, non-family and single-adult households were more likely to be identified because interviewers spent substantially more time building rapport with residents in area-microcensus clusters during the interview process [15]. This study also found lower design effects for socio-economic indicators in the area-microcensus design, suggesting better identification of heterogeneous "hidden" households, though household response rates were also lower in the area-microcensus sample [15].
Four tools and numerous ad-hoc geographic information system (GIS) approaches were described to select gridded population survey clusters (Table 3), and resulted in various forms of a gridded population sample frame, visualized in Figure 2. The first gridded population sampling tool was the open-source GridSample R package, released by Thomson and colleagues in 2016 [53] and used in six sub-national surveys [16,35,38]. The GridSample R algorithm treats the gridded population dataset as the sample frame and selects grid cells with PPS allowing for stratification, oversampling in urban/rural domains, and spatial oversampling [53]. The algorithm runs on a personal computer and is limited by the computer's memory. All datasets must be pre-processed and specified by the user, allowing use of any gridded population but also requiring GIS and/or R programming skills. The algorithm enables optional "growth" of clusters to a minimum population size or maximum area by randomly adding neighbouring cells after selection of "seed" cells with PPS. While this process results in clusters with roughly consistent population size for improved fieldwork, the population counts in the "grown" clusters do not reflect the population counts used for sample selection, and may skew sample weights [53]. The output is a shapefile of cluster boundaries, with attributes of estimated population counts.
Second, the Geo-sampling survey tool was created by RTI and used in 14 national and sub-national surveys [41] (personal communication, J. Cajka, RTI, 9 Apr 2020). The Geo-sampling tool is designed for use with larger grid cells (e.g. 1x1km), and supports a multi-stage stratified sampling approach. Clients are provided with a shapefile of the final cluster boundaries and population counts. In 13 surveys conducted in 2014-15, administrative units were sampled with PPS, and then 1x1km LandScan-Global cells were sampled with PPS. To improve fieldwork, 1x1km cells with fewer than 250 persons were excluded, potentially biasing the sample toward higher-density populations. The sampled 1x1km cells were partitioned into 150m, 100m or 50m grid cells depending on population density. Next, a deep-learning residential scene classification model was used to identify and exclude small cells without settlement, and disaggregate the 1x1km population to remaining small cells. Finally, three of the small cells are selected at random for an area-microcensus sample [34]. In a 2019 RTI survey, WorldPop-Random Forest estimates were aggregated to 400x400m cells and used in place of 1x1km cells, and a machine-learning building feature extraction algorithm was used to sample structures in the final stage of sampling (personal communication, J. Cajka, RTI, 9 Apr 2020).
Third, many gridded population surveys have developed ad-hoc approaches to sampling using GIS software, such as ArcGIS. Galway and colleagues sampled 1x1km cells with PPS, then randomly selected one household in one building and performed a random walk [40]. Thomson and colleagues converted 1x1km population counts to random points, selected points at random, manually delineated clusters within cells around selected points, and performed an area-microcensus sample [39]. Muñoz and Langeraar proposed an approach for 1x1km cells, though it is unclear if a survey followed [54]. In this approach, 1x1km cells are aggregated to 3x3km grid cells and sampled with PPS. Then 1x1km grid cells are combined within selected 3x3km cells to achieve a minimum population and sampled with PPS. Next, they select a 1x1km (or larger) area and manually delineate segments of approximately 100 households each. One segment is randomly selected, households are listed via a mapping-listing activity, and finally a sample of households is selected [54]. Sollom and colleagues joined 1x1km gridded population estimates to rural village point locations and sampled points with PPS, and then used spin-the-pen to sample households in the field [42]. Qader and colleagues used gridded population estimates to update census EA counts in urban areas where EA boundaries were available, and used a quadtree method to create different sized grid cells with approximately the same population each in rural areas [52]. The combined frame was sampled with PPS before manually segmenting and randomly selecting one household per segment [52].   A range of simple-to-advanced tools have been used to implement gridded population surveys. Lower-tech field tools include use of paper maps displaying cluster boundaries over satellite imagery in Google Earth, and paper listing forms and questionnaires [38][39][40]. Higher-tech field tools include tablet-based applications for navigation [15,41], paper field maps designed in GIS [15,16,38,40,52], and tablet-based household listing and/or questionnaires [6,15,16,38,41]. Satellite imagery was essential to all gridded populations surveys to manually segment along roads, rivers, and other features [35,39,54], and as a field map base layer [38][39][40][41]52]. In some surveys, satellite imagery was used to digitize building footprints and roads in OpenStreetMap which was then displayed as a field map base layer [16,35]. Many teams included points of interest from OpenStreetMap or GPS coordinates of recognizable intersections/structures on field maps to aid navigation [16,35,39,52].

DISCUSSION
The successful implementation of gridded population sample surveys across a variety of settings bodes well for this emerging field. However, a survey statistician considering whether to recommend an outdated census-based frame or a gridded population frame is faced with questions about sample frame accuracy, methods to form and select sample frame units, and optimal survey designs. Next, we outline a research agenda to equip survey designers to identify situations where gridded population sampling can be a feasible and trustworthy option. The agenda shows key stages of a gridded population survey and available options ( Figure 3).

Choose gridded population
Top-down gridded population datasets that restrict estimates to settled areas are likely to underestimate rural, and overestimate urban, populations because small settlements are often undetected in the settlement layer. Conversely, datasets that estimate population in all landmasses likely overestimate rural, and underestimate urban, population because fractions of the population are allocated to unsettled cells. Factors that affect survey accuracy include the gridded population model accuracy, aggregation of the gridded population model input dataset, whether residential or ambient population is modelled, accuracy and type of covariates, and area of the cell in which population is estimated [12].
A major gap is that cell-level accuracy is not known for any top-down gridded population datasets.
To assess accuracy, a recent census disaggregated to household locations would be needed, though this is rarely, if ever, available. The next best option is comparison of modelled gridded population estimates with micro-census counts from a sample of areas. Household listings from a recent geolocated household survey aggregated to cells might serve this purpose, but to our knowledge, data sharing agreements for such work have not been investigated or defined. Simulated household-level datasets are a third option [55].
Presently, some top-down datasets include model errors at the scale of the input population dataset based on internal validation. New bottom-up datasets are expected to include cell-level uncertainty estimates. When those datasets become available, survey designers will want to consider how uncertainty estimates might be used to improve sample designs or sample size calculations. In addition, DHS, MICS, LSMS, and other surveys are distributed via national statistical offices, and thus their sample frames hale from official sources. Processes are needed for national statistical agencies to engage with gridded population dataset production so that official endorsements might be made [44].

Choose sample design
Area-microcensus sample designs in small clusters (e.g. 10-20 households) may prove to be faster and cheaper than two-stage designs in larger clusters (e.g. 100-300 households), and more accurately sample vulnerable urban populations; however, there can be a counter-balancing detriment of higher survey design effects due to variable numbers of respondents per cluster, greater within-cluster homogeneity, and lower response rates. For survey designers to assess these trade-offs and to select a sample size that will meet stakeholders' goals for budget, timeline, and statistical precision, they need reliable projections of likely design effects in area-microcensus samples. The current limited evidence is mixed. A simulation study of a rural population in Namibia found that nearly twice as many area-microcensus clusters would be needed to achieve the same precision as a two-stage survey, holding constant the number of respondents per cluster [56]. While a study in urban Nepal found higher design effects for demographic indicators and lower design effects for socio-economic indicators in an area-microcensus design versus a two-stage design [15].
Also, as urban settlement classification becomes increasingly possible [57], survey designers need to understand how within-urban stratification affects the various sample designs used in gridded population, and other, surveys. With no way to stratify urban populations, all surveys are at risk of under-sampling or omitting slums and other vulnerable populations [58,59]. In addition, research is needed to balance survey designs that can support both precise design-based estimation of outcomes and precise SAEs of indicators at fine geographic scales [60].

Create sample frame
Existing gridded population sample frame approaches do not result in cluster boundaries that are recognizable on the ground. Improved methods are needed to use natural features such as rivers and roads to delineate cluster boundaries from gridded population data. Survey designers need to be confident that clusters will yield the right number of eligible respondents and have a geographic area that can be canvassed by a field team in the time budgeted for fieldwork.

Draw sample
Several gridded population sampling tools and approaches are available, and their feasibility is influenced by cost, transparency of the methods, clarity of documentation, and usability by survey design professionals in government agencies and organizations who may not have advanced programming and GIS skills.

Conduct fieldwork
The emerging field of gridded population survey sampling should recommend tools and protocols for both lower-and higher-tech settings. For example, a common protocol should be described to deal with arbitrary gridded population boundaries that intersect buildings (e.g. include buildings in north and east boundaries, exclude buildings on south and west boundaries). Uniquely, gridded population surveys rely on access to up-to-date high-resolution satellite imagery (0.5m) for fieldwork. This is less of a challenge in urban areas worldwide thanks to Google Earth, Bing, and other free websites. However, imagery resolution in rural areas of LMICs is quite variable, with images sometimes being several years old. As a result, it would be difficult to implement gridded population surveys in areas of heavy forest or cloud cover.

CONCLUSION
Organizations with skills in GIS and digital tools can successfully implement surveys with gridded population sample frames, which have the potential to yield samples that are more representative of mobile and vulnerable respondents than outdated census-based frames. However, census-based frames are likely to be considered a safe choice by many survey designers because censuses have long been the standard and their limitations are commonly accepted. To recommend a gridded population frame would involve risks and rewards that are currently difficult to quantify. New tools are needed to evaluate gridded population datasets and frames in specific country contexts, and to facilitate low-burden survey implementation. There are opportunities to develop tools for nearly every stage of survey planning and implementation, which ultimately will improve the accuracy of survey data.

Authors' contributions
DRT and DAR performed the literature review, and drafted the figures and text. MC and AT provided data interpretation and edits. All co-authors reviewed and approved the final manuscript.