A high resolution spatial population database of Somalia for disease risk mapping
© Linard et al; licensee BioMed Central Ltd. 2010
Received: 1 June 2010
Accepted: 14 September 2010
Published: 14 September 2010
Skip to main content
© Linard et al; licensee BioMed Central Ltd. 2010
Received: 1 June 2010
Accepted: 14 September 2010
Published: 14 September 2010
Millions of Somali have been deprived of basic health services due to the unstable political situation of their country. Attempts are being made to reconstruct the health sector, in particular to estimate the extent of infectious disease burden. However, any approach that requires the use of modelled disease rates requires reasonable information on population distribution. In a low-income country such as Somalia, population data are lacking, are of poor quality, or become outdated rapidly. Modelling methods are therefore needed for the production of contemporary and spatially detailed population data.
Here land cover information derived from satellite imagery and existing settlement point datasets were used for the spatial reallocation of populations within census units. We used simple and semi-automated methods that can be implemented with free image processing software to produce an easily updatable gridded population dataset at 100 × 100 meters spatial resolution. The 2010 population dataset was matched to administrative population totals projected by the UN. Comparison tests between the new dataset and existing population datasets revealed important differences in population size distributions, and in population at risk of malaria estimates. These differences are particularly important in more densely populated areas and strongly depend on the settlement data used in the modelling approach.
The results show that it is possible to produce detailed, contemporary and easily updatable settlement and population distribution datasets of Somalia using existing data. The 2010 population dataset produced is freely available as a product of the AfriPop Project and can be downloaded from: http://www.afripop.org.
Since the fall of the central government in 1989, Somalia has been suffering from 18 years of conflict and civil war that has resulted in a series of humanitarian disasters. Thousands of Somali have been externally and internally displaced and millions have been deprived of basic health and social services . Attempts are being made to reconstruct the health sector but with the exception of the relatively peaceful northern region, the current political situation in Somalia poses a major obstacle for the establishment of a comprehensive health care system. The situation has inevitably resulted in an almost complete lack of any systematic, comprehensive nationwide systems of vital or health information.
Defining the extent of infectious diseases as a public health burden and their distribution in time and space are critical to scoping the financial requirements, setting a control agenda and monitoring . However, any approach that requires the use of modelled disease rates requires reasonable information on the resident population for the years one is intending to estimate risk. Where risks of disease are over-distributed in space, such as is the case for most vector borne diseases, population distributions and counts must be resolved to higher levels of spatial detail than large regional estimates. Detailed and spatially disaggregated population data are essential resources in the assessment of the number of impacted people for planning health service delivery and for decision-making processes related to developmental or health issues [3–6].
Whilst high-income countries generally have extensive mapping resources and demographic data at their disposal, in low-income countries relevant data are either lacking or are of poor quality. In Somalia, the last national census was undertaken in 1987 . The major changes in population size and distribution that are occurring in Somalia undermine the fidelity of the population parameters from the 1987 census to derive contemporary estimates. In addition, census population counts are reported in large spatial units (regions or districts), when the known spatial pattern of disease risk is generally highly focal and spatially heterogeneous. The coarse spatial resolution of Somalia census data therefore limits its potential use for resource allocation and disease burden estimation.
Modelling techniques for the spatial reallocation of populations within census units have been developed based on ancillary datasets, such as transport networks, urban extents and slopes [3, 5, 8, 9], but work has also shown that, unless these datasets are complete and of finer detail than the census data, such modelling techniques can be detrimental to mapping accuracies . Analyses have also shown that land cover (LC) information derived from satellite imagery can be valuable in redistributing aggregated census counts to improve the accuracy of national-scale gridded population data in East Africa [10–13]. The intrinsic link between human population distribution and LC, particularly settlements, means that such data likely offer the best opportunity for improved population distribution modelling. Here we extend the methods developed by Tatem et al. , and recently refined , to model population distributions and densities at a fine spatial resolution in order to provide an evidence base for refining infectious disease mapping and for planning health service delivery in Somalia.
The Africover programme, operational in the 1995-2002 period, provided a multipurpose and consistent LC GIS database of 10 countries in East Africa. The Africover data uses land cover classification software (LCCS) for creating standardized and harmonized LC classes. This allows easy aggregation of LC classes based on a hierarchy of LC class detail, and allows comparison of LC classes across countries and regions. The full resolution Africover dataset for Somalia was acquired  and the 99 individual classes were aggregated to a more generic 22 classes to provide a consistent legend across the entire region. Secondly, a dataset depicting urban and settlement polygon outlines from detailed imagery was acquired from the GeoTerraImage Consultancy . This dataset was principally derived from 2005 Landsat imagery, through a combination of conventional on-screen interpretation and hierarchical clustering techniques, often involving the use of area-specific geographic masks. Industrial area delimitations for the main cities were also provided. In addition, roads and rivers data were obtained from Vector Map level Zero (VMAP0) dataset to aid testing and accuracy assessment.
In 2008, settlement point location data from different NGOs and UN agencies working in Somalia, including the United Nations Development Programme (UNDP), the German Agency for Technical Cooperation (GTZ), the Kenya Medical Research Institute (KEMRI), the Food Security Analysis Unit (FSAU), and the UN Office for the Coordination of Humanitarian Affairs (OCHA), were assembled into a single file. The dataset consists of 11,413 settlement locations divided in 8 categories: national capital (1); regional capital (17); district capital (57); town (13); part of town (186); settlement (9852); IDP camp (535); and temporary nomadic settlement (752). These 8 categories were aggregated into three classes: IDP camps, rural settlements and towns (this last group including national, regional and district capitals, towns and parts of towns). The nomadic settlements were removed from the database because of their temporary locations, but nomads were included in the population estimates. Some of the settlement locations include population size estimations, from different sources: KEMRI, GTZ and UNDP. When combining these differing sources of estimates, half of the settlement locations (49.4%) had a population size attribute (38% of towns, 52% of rural settlements and 0.9% of IDP camps).
Population count data were available at administrative level 2 (district) in Somalia. There are 74 districts, giving an average spatial resolution (ASR) of 93 km. The ASR measures the effective resolution of administrative units in kilometers, and is calculated as the square root of the land area divided by the number of administrative units . The OCHA provided population estimates by district for the year 2005. These are the most recent population data available for Somalia, as the last nationwide census dates back to 1987.
The UN High Commission for Refugees (UNHCR) and OCHA regularly provide updated estimates of refugee camp sizes and population movements in Somalia. The Afgooye corridor is a particularly dynamic area near Mogadishu, where more than 100 IDP camps are located, with a total of 366,000 refugees as of January 2010 . UNHCR and OCHA also provide recent estimations of surface area covered by IDP camps and population counts in 5 sub-areas of the Afgooye corridor [17, 18]. These data were obtained for use within the modelling process.
Detailed census data at administrative unit level 6 (enumeration area) from the 1999 population and housing census in Kenya, with corresponding administrative unit boundaries for 59 of the 69 Kenyan districts, were also obtained  to provide guidance on typical population densities by land cover type in the East Africa region.
Most population modelling methods essentially involve some form of re-distribution of aggregate census counts using ancillary datasets at finer spatial detail that are known to influence human population distribution. Here, the district level population count data were redistributed at a finer spatial scale using all the available information contained in the datasets described above. Specifically, the Africover LC dataset was first adapted to accommodate the more precise and detailed mapping and locational information on settlements provided by the Landsat-derived settlement polygons, the settlement points and refugee camps, all described above. Next, LC specific weights were derived based on information on population sizes from the settlement points and from the detailed Kenyan census data where the same LC classes exist (as described in ). These calculated densities were then utilised as weightings to redistribute population by settlement and LC type that were unaccounted for by existing settlement population size data. More detail on the full process is outlined below.
The Africover urban class, which typically overestimates settlement extent size [11, 12], was first removed and the surrounding classes expanded equally to fill the remaining space. The settlement location data and the Landsat-derived settlement polygons were then used to refine the 'urban area', 'rural settlement', 'refugee camp' and 'industrial area' classes. Given the clustered nature of populations across Somalia, ensuring that all known settlements were identified and mapped using information from all available datasets represented an important step.
1. Urban areas
The urban class refinement was mainly based on settlement locations classified as 'towns'. Town extents were mapped in three different ways according to available data: (i) the Landsat-derived settlement polygons were mapped when town location points could be mapped unambiguously onto a polygon, (ii) information on population size was used to provide an estimate of town extent, where just a single georeferenced location existed, and (iii) an average town extent of 1.04 km² - which corresponds to the average size of Landsat-derived settlement polygons intersecting with towns - was used for towns where just a single point existed and the population size was unknown. An urban extent map was derived from these town extents.
2. Rural settlements
A similar method as above was used to produce a rural settlement layer, based on rural settlement points: (i) the Landsat-derived settlement polygons were used when settlements could unambiguously be mapped onto a polygon, (ii) information on population sizes were used to provide an estimate of settlement extent where just a single georeferenced location existed, and (iii) a settlement extent of 10,000 m² (i.e. one pixel) was used for settlements where just a single point existed and the population size was unknown. We did not use the average size of rural settlements because only 2.7% of rural settlement points intersected Landsat-derived settlement polygons (in contrast to 69% for towns), which suggests that only the biggest settlements were detected in Landsat-derived settlement polygons database and that, using these, the average size would then likely be overestimated.
3. IDP camps
The 'refugee camp' class mapping was mainly based on settlement locations classified as 'IDP camp': (i) information on population sizes were used to provide an estimate of IDP camp spatial extent, and (ii) an average IDP camp extent of 0.04 km² - which was calculated based on the UN data for the Afgooye corridor - was used for IDP camps where the population size was unknown. IDP camp extents were assembled to form a 'refugee camp' map.
4. Industrial areas
The Landsat-derived industrial area delimitations were used to define an industrial area map.
The urban, rural settlements, refugee camps and industrial area maps were all overlaid onto the Africover dataset and the land covers beneath were replaced to produce a refined LC dataset.
Relative per LC class population densities were defined for each class of the refined LC dataset. The average population density in urban areas and rural settlements were calculated based on the Landsat-derived settlement polygons combined with settlement population counts. Average population densities of 18,302 people/km² and 2,990 people/km² were calculated for urban areas and rural settlements, respectively. Typical population density in refugee camps was calculated based on available data for the Afgooye corridor. The Afgooye corridor is divided into 5 sub-areas, for which the UN OCHA estimated the surface area covered by IDP camps  and the UNHCR estimated population sizes . From these data, we calculated an average population density of 77,199 people/km² in IDP camps. Zeros were attributed to classes with no human habitation such as water bodies, industrial areas and sand beaches.
The average population densities of the remaining LC classes were derived from the Kenyan census data, where significantly more accurate and detailed data on population distribution were available. The Kenyan Enumeration Area (EA) census data , which contain 46,034 EAs and has an ASR of just 3.21 km, provided a valuable dataset for calculating more accurate relative per LC class population densities than could be obtained from existing Somalia data. Moreover, all the Africover LC types found in Somalia are also present in Kenya. The average population density of one specific LC class was calculated based on EAs that record this LC class for the majority of their pixels, as outlined in  and . As shown in , the extrapolation of LC specific population densities to neighbouring regions had a limited impact on population distribution model accuracies in Kenya. However, even if the relative values between population densities derived from Kenya are important, the absolute population density values can vary notably from one country to the other. Population densities derived from Kenya are expected to be overestimated because small settlements were not distinguished from major Africover classes in Kenya. Moreover, populations are much more clustered across the whole of Somalia due to the arid environment. We therefore varied the population densities derived from Kenya by scaling them by a sequence of weightings between 0 and 1 (with an increment of 0.01), while keeping the weights derived from Somalia data fixed. We tested the accuracy of population data produced based on each population density table by comparing predicted population with the observed population in towns and settlements from the location dataset with known populations. This provided a test of the repartition of populations between settlements/towns and other LC classes. The root mean square error (RMSE) was extracted for each population dataset. The LC specific population density table that produced the lowest RMSE was selected for the final population distribution model.
The per-LC class densities defined above were used as weightings to reallocate populations within Somali districts. Per-pixel population densities were adjusted to match the total population estimated by the UNDP (2005) in the administrative units that they belonged to. An estimate of population in 2010 was produced based on UN rural and urban growth rates for the 2005-2010 period, using the following equation: P2010 = P2005e rt , where P 2010 is the required 2010 population within a pixel, P 2005 is the population within the same pixel at year 2005, t is the number of years between year 2005 and 2010, and r is the average growth rate for rural pixels (2.21%) and urban pixels (4.17%) - these growth rates were taken from the UN World Urbanization Prospects Database, 2007 version .
Average population densities for each LC class in Somalia
Average population density (people/km2)
Data used for population density calculation
UN data on Afgooye corridor, Somalia
GTI; settlement points
GTI; settlement points
Population distribution and projected population estimates in Somalia
% Pop in towns, IDP camps and rural settlements
Total population 2005
Total population 2010
Central & South
Population movements are particularly intense in Somalia, with currently approximately 1.5 million internally displaced people , making the spatial quantification of population distributions a difficult task. Modelling population distribution is however of key importance for estimating the population at risk of infectious diseases, and, ultimately, disease burden. Results presented here show that it is possible to produce contemporary and detailed settlement and population datasets of Somalia using existing detailed datasets and methods that can easily be updated as new data becomes available.
We have used a combination of methods to develop a population distribution model that is matched to administrative population totals provided by the UN. A particular emphasis was given on the integration of settlement extents in the LC dataset. Given that populations in Somalia are highly clustered, and these rich datasets exist, it was important to ensure that the settlements - where the vast majority of Somali people live - were mapped as accurately, and with as much spatial detail, as possible. We used simple and semi-automated methods that can be implemented with free image processing software to produce easily updatable data at 100 × 100 meters spatial resolution. Given the scales and speed with which population movements are occurring in the region, such features are a necessity. The new dataset was compared to existing population datasets. Even if a conclusive and completely fair comparison is not possible due to resolution and construction differences, the results of these analyses allowed the identification of major differences between the datasets. Our tests showed that total population numbers differ and that important differences in population values can be observed in more densely populated places, i.e. in towns and villages. The population density within the main cities strongly depends on the urban extents used in the population mapping procedure. This supports the idea that using accurate and spatially detailed settlement extents is of key importance in population distribution modelling. Thanks to its finer spatial resolution, the AfriPop dataset was able to incorporate data on hundreds of small villages that were not represented in the LandScan and GRUMP datasets.
Major differences observed between the estimated PAR of Pf malaria demonstrate how important the choice of population dataset is in disease risk and burden estimations. Our results showed that differences in population distribution can induce large differences in PAR estimates. The AfriPop and LandScan datasets showed a similar partition of people between endemicity classes, but with substantial differences in absolute numbers. The GRUMP dataset, which has been used to estimate PAR of Pf malaria in the past  predicted a much larger number of people living in highly endemic areas than the more detailed AfriPop dataset.
As discussed previously, determining the accuracy of spatial population datasets is often a difficult operation, given the usage of all available datasets in the interpolation process. Thus, deciding definitively upon which population dataset provides the most accurate estimates of PAR here is impossible. However, it is well known that malaria transmission in Somalia is focal and heterogeneous , partially due to the clustered nature of the population distribution. As the precision and detail of malaria transmission mapping improves, spatial population datasets that capture these patterns are therefore required if PAR is to be more accurately quantified. The areal-weighting approach applied to relatively coarse administrative unit level census counts, as was undertaken for production of the GRUMP map (figure 3), fails to capture such patterns. The interpolation approach adopted for the production of LandScan aims to map the clustered nature of population in Somalia, but the completeness of the input data is not clear without the provision of extensive source and metadata information, and previous assessments have suggested that the use of incomplete input data are detrimental to accuracies . By making use of complete, contemporary and well-validated datasets to capture the over dispersed population distribution patterns within Somalia, we therefore have reason to believe that the estimates of PAR derived through the AfriPop dataset constructed here are likely to be the most accurate.
The population distribution modelling approach developed in this paper and others [12, 13], will be applied to other low-income sub-Saharan countries. The methodology used will however differ from country to country according to data available; one of the principal aims of the AfriPop project  is to make use of detailed, well-validated datasets where they exist to improve mapping precision and accuracy. In most sub-Saharan countries, detailed spatial population data are lacking, but are often of primary importance for disease burden estimation and health service planning. Future work on malaria PAR and burden estimation will rely on these more detailed spatial population datasets, and the potential exists to improve such estimates for other diseases across the continent.
Detailed and contemporary spatial population data are valuable for assessing the risks and burden of infectious diseases, for planning humanitarian assistance, resource allocation, or public health strategies. The construction of a detailed population database for Somalia has been described here using routinely collected data and semi-automated methods that can easily incorporate new data as it becomes available. The 100 × 100 meters gridded population dataset is freely available as a product of the AfriPop Project . The AfriPop project aims to provide detailed and open access spatial population datasets for all African countries.
The authors are grateful to Lisa Peterson from UNOCHA, Somalia, for help in providing important population and settlement databases. CL is supported by a grant from the Fondation Philippe Wiener - Maurice Anspach. AMN is supported by the Wellcome Trust as a Research Training Fellow (#081829). RWS is supported by the Wellcome Trust as Principal Research Fellow (#079081) that also supports VAA. AJT is supported by a grant from the Bill and Melinda Gates Foundation (#49446) and acknowledges funding support from the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health. This work received support from the Global Fund to Fight AIDS, Tuberculosis and Malaria to UNICEF-Somalia (SSA/SOMA/2010/00000316-0). This work forms part of the output of the AfriPop Project , principally funded by the Fondation Philippe Wiener - Maurice Anspach, and the Malaria Atlas Project , principally funded by the Wellcome Trust, U.K.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.