Mathematical models for predicting human mobility in the context of infectious disease spread: introducing the impedance model

Background Mathematical models of human mobility have demonstrated a great potential for infectious disease epidemiology in contexts of data scarcity. While the commonly used gravity model involves parameter tuning and is thus difficult to implement without reference data, the more recent radiation model based on population densities is parameter-free, but biased. In this study we introduce the new impedance model, by analogy with electricity. Previous research has compared models on the basis of a few specific available spatial patterns. In this study, we use a systematic simulation-based approach to assess the performances. Methods Five hundred spatial patterns were generated using various area sizes and location coordinates. Model performances were evaluated based on these patterns. For simulated data, comparison measures were average root mean square error (aRMSE) and bias criteria. Modeling of the 2010 Haiti cholera epidemic with a basic susceptible–infected–recovered (SIR) framework allowed an empirical evaluation through assessing the goodness-of-fit of the observed epidemic curve. Results The new, parameter-free impedance model outperformed previous models on simulated data according to average aRMSE and bias criteria. The impedance model achieved better performances with heterogeneous population densities and small destination populations. As a proof of concept, the basic compartmental SIR framework was used to confirm the results obtained with the impedance model in predicting the spread of cholera in Haiti in 2010. Conclusions The proposed new impedance model provides accurate estimations of human mobility, especially when the population distribution is highly heterogeneous. This model can therefore help to achieve more accurate predictions of disease spread in the context of an epidemic. Electronic supplementary material The online version of this article (10.1186/s12942-017-0115-7) contains supplementary material, which is available to authorized users.


Background
Epidemic spread depends on the likelihood of infection, as well as on individual human interactions. Concerning the latter, mobility networks play a huge role in the temporal and spatial dynamics of disease transmission within a population [1].
If mobility networks cannot be provided, reaction diffusion models can roughly report on the epidemic spread [2]. In the last decade, there has been a growing interest among infectious disease epidemiologists in estimating human mobility and rebuild mobility networks [3]. While mathematical models of human mobility have been in use since the last century, they are of less value

Open Access
International Journal of Health Geographics *Correspondence: kankoe.sallah@univ-amu.fr 1 INSERM, IRD, SESSTIM, Sciences Economiques & Sociales de la Santé & Traitement de l'Information Médicale, Aix Marseille Univ, Marseille, France Full list of author information is available at the end of the article when real mobility data or reliable proxies-such as the call detail records (CDRs) of mobile network operatorsare available [4]. The usefulness of these models appears in data-scarce contexts, such as during infectious disease epidemics in low-income countries, when forecasting the best possible allocation of resources becomes necessary. Indeed, these models can predict mobility patterns based solely on population size, population density, and travel distance.
The gravity and radiation models are the most commonly used mobility models today. The gravity model posits that mobility between two locations increases with population size and decreases with distance [5], whereas the radiation model assumes mobility to depend on population density [6]. Insofar as it relies on parameter tuning, the gravity model provides a broad theoretical framework that is pragmatically useless in the absence of specific space-dependent fitting [7]. As for the radiation model, it merely serves to predict the relative probability of mobility from a given location to different destinations-though it can help deduce absolute number of trips, provided that the average number of trips from each source location is known or approximated. In short, in data-scarce contexts such as in low-income countries, mathematical models of human mobility can only be used with minimal assumptions about the overall probability of mobility in the population.
The aim of this study was to introduce a new, parameter-free model, the impedance model, for predicting human mobility in the context of data scarcity.
In this article, we define the mobility models, translate them into formulations that allow for rigorous comparison, simulate a set of spatial patterns linked to various scenarios, and then compare the performance of the new impedance model, intuitively adapted from the laws of electricity, to previous well known models, for each simulated pattern. Lastly, we evaluate the performance of each of these models in predicting a real infectious disease spread, namely the 2010 cholera epidemic in Haiti.

The impedance model
In order to estimate probabilities of mobility in datascarce contexts, we proposed an intuitive and parameterfree model adapted from Ohm's law of electricity (1827). We developed this model by simple analogy with the electric current model, where electric potential is represented by P = I × R, R is the electric resistance, and I is the electric current (charge per unit of time). In a human mobility model, electric resistance is assimilated to distance (d), electric current to number of trips per day (F ij ), and electric potential to mobility potential per day on a given trajectory. The latter depends on the overall probability of mobility α applied to the size of the source and destination populations. It can be formulated as α P i + P j .
Thus, according to the impedance model, the number of trips flow from i to j can be formulated as F ij = α The probability for a person in location i to travel from i to j is given by This formula is parameter-free. The previously known mobility models have also been developed by analogy with the laws of physical science. Among these, some have been pragmatically successful, and have thus become especially popular.

The gravity model
The gravity model estimates the number of trips F ij , between two geographical locations i and j, knowing their population sizes P i and P j and the distance between them, d ij , as To allow rigorous comparisons, the formulation was adapted. See Additional file 1: A1

The radiation model
The radiation model was developed by analogy with the processes of emission and absorption of electromagnetic particles studied by physical scientists. The probability of commuting from i to j is given by More details on radiation model are given in Additional file 1: A2

Data generation
In order to assess the performance of our proposed model in plausible epidemiological situations, we generated data to build spatial patterns that could stand for real epidemic spaces. Location coordinates were generated using the double-uniform distribution. Each spatial pattern formed a square whose sides spanned between 1° and 40° in a polar coordinate system, and included a limited number of locations representing different demographic units. The area of each generated pattern varied approximately from 12,100 km 2 (roughly the size (3) π ij = P i P j P i + S ij P i + P j + S ij of Gambia or Lebanon) to 19,360,000 km 2 (roughly the size of Russia or North America). The number of generated locations in each spatial pattern varied randomly from 5 to 30. Each pattern reflected population densities that can be plausibly observed by demographers (100-10,000 inhabitants/km 2 ). Two topological rules were applied to create plausible patterns: (1) each pattern had to include at least 10 km between locations of more than 100,000 inhabitants; and (2) the size of the population generated for each pattern had to be proportional to the pattern area.
Our aim was to account for geographical scale and to avoid the inconsistencies that come with considering huge populations in small areas (e.g., locations with more than 10,000,000 inhabitants in Gambia) or with examining very small populations in large areas (e.g., accounting for villages with small populations when studying disease spread throughout Russia).
The double-uniform distribution was used to generate population sizes. Population size heterogeneity was controlled by varying the gap between the two uniform distributions. In sensitivity analyses, the hypothesis used to generate population sizes was modified with a Poisson distribution, a double Poisson distribution, a normal distribution, and a truncated normal distribution. The regular grid pattern was also assessed to estimate result variability due to travel distances.

Simulation design
Reference data were needed to evaluate the performance of the three models on simulated patterns. In the absence of real mobility data or reliable proxies, we had to make assumptions to generate reference mobility data. Several scenarios were proposed, all of which had as a prerequisite a power deterrence function.
The first scenario-the source population and distance deterrence (SPDD)-assumed that the probability of mobility from source i to destination j decreased with the distance between i and j and was proportional to the size of the source population.
The second scenario-the small to large population with distance deterrence (SLDD)-assumed that the probability of mobility on a given trajectory was proportional to the destination/source population size ratio and inversely proportional to distance.
The third scenario-large to small population with distance deterrence (LSDD)-assumed that the probability of mobility was proportional to the destination/source population size ratio, while retaining the power deterrence function.
The total number of trips from location i (T i ) depended on the overall probability of mobility (α) and on the size of the source population (P i ) (Eq. 8). The expected number of trips on a given trajectory ij (F ij ) followed a Poisson distribution (P), whose average depended on the probability of mobility from location i to destination j (π ij ) and on the total number of trips from location i (Eq. 8).
where P represented a Poisson distribution.
The epidemiological literature on disease spread does not provide recommendations on the threshold proportion of infected individuals that needs to be detected to avoid disease propagation. Obviously, this number depends on specific spatiotemporal patterns. Nevertheless, for methodological reasons, we assumed that a 1% error (Δ) in the estimated overall probability of mobility has significant consequences for emergency planning. This means that surveillance systems should be able to detect a variation in the overall daily probability of mobility corresponding to 100 persons moving from a small town of 10,000 inhabitants or to 100,000 persons moving from a huge agglomeration of about 10,000,000 inhabitants such as Paris. Assuming the maximum overall probability of mobility to be 10%, the maximum variance in the probability of mobility is σ 2 = 0.1 * 0.1 = 0.01. According to simulation guidelines [11], the number of simulations needed to detect significant difference in the overall probability of mobility estimated by various models is given by  16:42 For each scenario, we performed 500 independent simulations. The same spatial patterns were used to compare the three mobility models so that bias due to sample variability was eliminated while assessing the differences between the models.

Measures of performance
Statistical measures of performance were: overall probability of mobility α; bias in the probability of mobility (δ); and average root mean square error (aRMSE) in number of trips.
The overall probability of mobility α estimated by the model for a given pattern was defined as the percentage of travelers in the pattern's population. This parameter can reveal over/underestimations of the overall probability of mobility, and is equal to where F ij is the number of trips on each ij trajectory estimated by the model for a given pattern, and P i is the size of the population in the source location n of that pattern.
The bias δ associated with each model was defined as δ =ᾱ − α 0 , where α is the overall probability of mobility estimated from a single simulation, ᾱ is the average rate over a set of N simulations, and α 0 is the value of the overall probability of mobility used for data generation.
The aRMSE in number of trips estimates was computed for each mathematical model as the average of root mean square errors over a set of N simulations.
where F model and F ref are the numbers of trips on a given trajectory-as estimated with model and reference data, respectively-n is the total number of trajectories in a given pattern, and N is the number of simulations for which the measure was computed.

Modeling mobility patterns in the 2010 cholera spread in Haiti with a basic susceptible-infected-recovered (SIR) transmission framework
As a proof of concept, we used the data routinely collected during the first weeks of the cholera epidemic in Haiti by the Ministry of Public Health and Population. These data have already been used for the epidemiological analysis of the first year of the epidemic [12]. This epidemic, of exceptional magnitude, struck Haiti in October 2010, following the massive contamination of the Artibonite River after cholera was introduced in the country by a military contingent [13].
Briefly, morbidity data were prospectively collected at the commune level according to the World Health Organization standard definition [14].
In the current study we only analyzed data from October 27, 2010 to December 27, 2010, corresponding to the expansion phase of the epidemic, discarding the first 2 weeks a period when cholera transmission was related to the massive contamination of the river rather than a human-driven diffusion [13]. These early stages of the epidemic were modeled using a basic SIR framework.
We used two spatial definitions (n = 140 and n = 78 locations): The first corresponded to municipalities, and the second to agglomerations connected by a human mobility network [4]. The set of differential equations below represents the dynamics of transmission in each location i. The overall population was assumed to be at demographic equilibrium, with the birth rate balancing the mortality rate; it was also assumed to be not immune in the early stages of the epidemic.
The equations above represent the weekly variations in the number of individuals in the susceptible (S i ), infected (I i ) and recovered (R i ) compartments at location i. β represents the rate of exposure to the disease-i.e., the rate of individuals becoming infected due to exposure to the effective incidence of cholera. The effective incidence of cholera in a location i, (1 − α)I i (t) + α j� =i π ij I j (t) was modeled as a weighted average of incidence rates in local and remote locations. Weights depended on the overall probability of mobility (α), but also on the relative probabilities of mobility from destination locations j to source location i which were provided by the matrices derived from CDRs or from the mathematical (impedance, gravity, radiation) models formulated in Eqs. 1, 3, and 4. γ is the recovery rate of the infected compartment. We evaluated mobility matrices under two hypotheses. The first made no assumption regarding specific overall probabilities of mobility from each location, allowing the rate α to be adjusted. The second assumed the overall probability of mobility from each location to be approximated by CDRs.
This CDRs mobility matrix resulted from the processing of cell phone network metadata of 2.9 million (12)  [4,10,15]. The fraction of Subscriber Identity Module Cards (SIM cards) on the total population in 2013 was 61%. The maximum distance between two pairs of telephone towers was 38.6 km. The trajectories of subscribers between the phone towers made it possible to reconstitute the mobility network between geographical locations. For each pair of sites (i, j), the proportion of SIM cards in location i at time t, which were found at location j at time t + 1, was calculated.
To determine which model best describes the early stages of the cholera outbreak in Haiti, we compared the performances of the candidate models according to Akaike's information criterion [16,17] where θ is the number of estimated parameters in the model, and η = n u × n w is the number of data points (with n u and n w being the number of spatial units and the number of weeks since the start of the calibration period, respectively). The sum of the squares of the residuals between model estimates and epidemiological records is denoted by RSS.
where C(i, j) and Ĉ i, j correspond to the weekly number of reported cases and to the number of cases estimated by the model in a spatial unit u and a week w, respectively.

Average root mean square error (aRMSE)
Estimates of absolute flows from all simulations and all scenarios are shown in Fig. 1. The average root mean square error was lower with the impedance model (log(aRMSE) = 7.19 CI (7.10-7.35)) as compared with the gravity and radiation models (log(aRMSE) = 7.44 CI (7.34-7.54) and 8.40 CI (7.91-8.67), respectively).
Similar tendencies were observed (both in aRMSE and bias) when using the grid distribution of locations, Poisson distribution or a power law and whatever the specific mobility scenarios: SPDD, LSDD or SLDD (Fig. 2). Assuming identical population sizes, the gravity and impedance models performed equally. Average error increased with area size. A hundred-fold increase in the standard deviation of population size (the coefficient of variation changing from 0.2 et 3) yielded a 5% aRMSE increase in the impedance model and 7 and 10% aRMSE decreases in the radiation and gravity models, respectively (Fig. 3). Thus, in heterogeneous settings, the impedance model produced the most accurate absolute flow estimates, as compared with older mathematical models.
Probabilities of mobility were estimated according to destination population, on the one hand, and according to travel distance, on the other. The predictions made with the impedance model were the closest to simulated data, regardless of population heterogeneity. When predicting short-distance mobility and mobility to destinations with small populations, the impedance model outperformed the gravity and radiation models, especially in the case of heterogeneous populations (Fig. 4).

Bias
Bias in the 5% reference probability of mobility used to generate data is represented in Fig. 5 for various area sizes and for two levels of population heterogeneity.
The impedance and gravity models were unbiased regardless of the spatial pattern. The radiation model underestimated the overall probability of mobility. Bias was persistent regardless of the scenario and increased in the case of heterogeneous patterns: δ = − 0.0099 for heterogeneous populations versus − 0.0037 for homogeneous ones. Bias persisted in the grid distribution of locations. Figure 6 summarizes the aRMSE estimates on the total number of trips in each scenario over five hundred

Accuracy of the transmission model assuming various mobility patterns for the 2010 Haiti cholera epidemic
Statistical dispersion measures comparing population heterogeneity for both spatial definitions are presented below ( Table 1). The coarse pattern (which includes 78 aggregated units) displays greater population heterogeneity.
Transmission parameters estimated with the Metropolis-Hastings Markov Chain Monte Carlo algorithm using a CDRs-based fine-scale mobility matrix (n = 140) are presented in Table 2.
CDRs data were used to calibrate the transmission parameters. These parameters were then entered in new models that lacked mobility patterns. Table 3 shows the AIC results obtained with different mobility matrixes, assuming that the rate of mobility from each location remains unknown. Both the matrix structure and the overall probability of mobility Fig. 2 For each scenario separately, number of trips estimated by the three models versus simulated reference data. Number of simulations was limited to 160. Each dot represents the logarithm of the total number of trips on a given trajectory. Power law was simulated by taking the square of a selected range values were tested. Results are shown for both spatial definitions. The CDRs-based model was taken as a reference when computing AIC variations. The impedance model performed well with heterogeneous spatial distributions, in which locations were defined as significant population agglomerations. Aggregated CDRs failed to fit the epidemiological data properly, even when using the most realistic parameters obtained with the fine-scale definition (AIC = 25,870 with the coarse definition vs. − 5200 with the fine-scale definition). Using the coarse definition, the AIC decreased by 22% for the impedance model, by 20% for the gravity model, and by 8.4% for the radiation model. Table 4 shows analogous results obtained by assuming that the overall probability of mobility (α) is provided by CDRs.
Additional file 1: B shows the sensitivity analysis associated with the results presented in Table 3. The latter correspond to the AIC variations that followed from variations in mobility rates.

Discussion
In this article, we proposed a new, parameter-free and intuitive model for predicting human mobility in the context of data scarcity. We evaluated the performances of Probability of travel according to destination population, travel distance and population distributions. CV stands for the coefficient of variation of population sizes. Panels a and b stand for homogeneous population size (CV < 1) and panels c and d for heterogeneous population size (CV > 1). Short-distance mobility and mobility to destinations with small destinations are best predicted in heterogeneous patterns with the impedance model the impedance model through intensive simulation, and compared it to the (non parameter-free) gravity model and the (parameter-free) radiation model. Our results suggest that when the number of trips from each location is known (as assumed in the radiation model and as extracted from CDRs), the impedance model provides the best equation for predicting the distribution of travelers towards destinations, in aggregated patterns.
The inclusion of reactive-diffusion equations in dynamic models of infectious disease transmission is a method of determining the spread of an epidemic in the absence of mobility data. This method assumes that the movements of all individuals are stochastic, happening by continuous progression throughout the geographical space and therefore, each individual can potentially visit all the geographical locations. Depending on the context, this assumption can be inaccurate [2], even if additional parameters may adapt the implications [18]. Recent works have shown that human mobility is better represented by a specific spatial network [19][20][21]. Each Individual usually does not visit all the locations. This results in saturation in the rate of epidemic spread, whereas the classical diffusion hypothesis does not admit a limit to the diffusion speed when the mobility rate increases [2]. However, in the context of the Black Death outbreak model, Gaudart et al. [18] used a local viscosity parameter proportional to the altitude and human density, thus  The SPDD scenario appears to be the most plausible one with regard to simulated data modeling the maximal diffusion velocity. This approach excluded low population density/mobility from the first wave of the epidemic.
Our study also uncovered the existence of intrinsic biases associated with CDRs, depending on the spatial definition used. In fact, we found that the most precious information that can be extracted from CDRs are the rates of mobility from the source location, not the probabilities of mobility to different destinations.
In addition, our study showed that the radiation model typically underestimates mobility, and can therefore yield inaccurate epidemiologic predictions, as has already been suggested [9,[22][23][24]. Our stimulation approach was more systematic than those used in previous studies, (including that of Masucci et al. [23]), as these have mainly relied on empirical data from specific countries.
We found that probability of mobility according to distance and to size of destination population-as measured in several studies [6,9]-is a pooled measure that may mask errors in core flow estimates. Estimates of the probability of mobility beyond a given radius can be flawed because flows are pooled before probability is computed. In the context of infectious disease spread, raw flows are more relevant than average probabilities. Moreover, RMSE is a more relevant measure to assess the performance of mobility models in epidemiology because it is directly based on absolute number of trips. The most reliable model is expected to yield the smallest RMSE.
The scenarios defined for data generation did not reveal anything specific. However, their formulation did help to account for a wide range of mobility hypotheses. The second scenario (SLDD) seemed to correspond to mobility patterns during peak periods of activity in both industrialized and less industrialized countries, as well as to patterns of rural migration in less industrialized countries. The third scenario (LSDD) resembled mobility patterns in industrialized countries during holiday periods, as well as patterns of populations fleeing conflict zones. While these results were consistent across all three scenarios, it is obvious that mobility is driven by far more complex motivations at the individual level-which may explain the unpredictability of even the best model for any given area [25]. Moreover, we know that CDRs are unreliable   Table 3 Minimal AIC values, assuming that the overall probability of mobility (α) is to be fitted AIC values obtained with each mobility model, for the 2010 Haiti cholera epidemic, assuming that the overall probability of mobility (α) is not given by CDRs, but fitted. The indicated models are: impedance model (IM), gravity model (GM), and radiation model (RM). Results are presented for the two spatial definitions. ∆ AIC corresponds to the variation from the optimal AIC value derived from the CDRs  Table 4 Minimal AIC values, assuming that the overall probability of mobility (α) is known from CDRs Overall probability of mobility (α) was derived from the CDRs. The indicated models are: impedance model (IM), gravity model (GM), and radiation model (RM). Results are presented for the two spatial definitions. ∆ AIC corresponds to the variation from the optimal AIC value derived from the CDRs