The results in this paper argue that disease transmission is a function of more than just biology, as is well known, but often ignored. The impact of adjusting the assumption of homogeneous mixing in this South African outbreak, is that apart from the densely populated, urban areas, the pandemic would likely not have been sustained just in the rural, sparsely populated provinces. This finding reinforces the obvious: if individuals have very limited contact with each other, then the outbreaks would probably be small in numbers, limited to small groups, and would likely not propagate to become a larger and more noticeable outbreak. Our estimates of the reproductive number for the more populous provinces are consistent with results reported elsewhere, but the results we obtain for more rural provinces are notably lower [5, 13, 14].
In our analysis, there are other important issues to consider. Throughout we have assumed that the reporting of cases is uniform throughout the country, and this was the basis for our sizable imputation of the number of symptom onset dates. Even if this reporting is less than 100%, but still spatially uniform, the results we observe will hold . However, if reporting is not uniform between provinces and some provinces have much better reporting of cases than others, we can expect the results to change. For instance, if reporting was lower in the more rural provinces, then it is likely that the estimated reproductive numbers would increase in these provinces if some adjustment for this underreporting were made. Without a more detailed study, this is difficult to quantify and explain. Clearly, there is a certain amount of confounding present, and data reporting issues can be part of an explanation for the results we obtain.
Another factor that can, at least partly, explain the results are the choice of transmission matrices used. We show results for various transmission matrices in order to quantify the degree to which transmission occurs between provinces. Four of these matrices are somewhat arbitrary and not based on actual data. One matrix is based on actual travel patterns in South Africa. But the results are reasonably consistent for the four matrices that assume some degree of transmission between provinces, even when the amount is very small, as in the travel-based matrix. This argues that the results are influenced more by the fact that such a matrix is used and less by the form that such a matrix takes. In all of these cases, despite the substantial differences in the matrices, the result is the same: transmission is maintained in more urban areas and rural areas fail to sustain transmission.
We note dramatic differences between the results when transmission between provinces is incorporated into the estimation (matrices b-e) and the results that assume that no transmission occurs between provinces (matrix a). This reflects the impact of using such matrices and the importance of performing sensitivity analyses to determine the impact of the matrix on the results. Possibly why such matrices have not been used in the past, even though they have a qualitative impact on the results, is that these matrices are difficult to come by, and in practice, they are likely to be estimated in a somewhat ad hoc manner. In some cases, there may be little or no data to inform a transmission matrix. In this case, a wide variety of matrices can be used to determine the plausible range of values that the estimates can assume. Ultimately deriving a method for estimating these matrices, ideally using Bayesian tools, would mitigate this challenge. The framework we provide here lends itself to such an approach, although we have not carried out such an analysis.
We further note the coarseness of the spatial resolution of our data. Our implicit assumption is that individuals within a province are homogenous. While assuming homogeneity within a province is more general than assuming homogeneity over the entire country, it is still a substantial assumption that ignores potentially important variations within a province. As with any analysis, we are limited by the available data, and acknowledge that data on a finer spatial scale would be desirable.
Our method also makes a strong assumption of independence between space and time. That is, we assume that the probability of a particular infector-infectee pair is influenced independently by the temporal and spatial distance between the two individuals. Clearly violations of this assumption are feasible and could impact our results. Without further information on the potential correlations that exist between space and time, any adjustment would be arbitrary and potentially misleading.
While the results we obtain might partially be explained by data quality issues and care-seeking behaviors in rural versus urban populations, there are other potential explanations. A recent cross sectional, serum study reports differential exposure to influenza strains in China across five communities , with the most urban community reporting the highest exposure to influenza strains. This provocative result begs further study as it is likely to be attributable to a number of factors and is consistent with the results we have obtained here.
The marked difference in estimates of transmission in rural versus urban areas in our study is also consistent with recent work on social contacts [17, 18]. In a study in Japan, there was a significant relationship between the number of social contacts and urbanicity amongst the elderly. Additionally, they similarly found that those in more urban areas have a greater chance of having more supportive interactions . Influenza transmission requires proximity between individuals and social contacts could be one surrogate measure of this proximity.
Additionally, there has been an observed influence of climate and relative humidity on influenza transmission [19–22]. The climate across South Africa is variable with some of the more rural provinces being characterized by a drier climate; the country’s climate is mostly semi-arid, but subtropical along the east coast. So this is not an ideal country to test the transmission theory, but KwaZulu-Natal is the only relatively humid province, so it does not appear that the humidity hypothesis is borne out by these data.
Travel patterns have been correlated with the movement of influenza on a large scale [6, 7]. Indeed, Viboud et al show that an outbreak that starts in a rural area will spread slowly until it reaches an urban center, at which point it will spread much more quickly . We attempt to incorporate travel patterns in South Africa in our analysis. Travel between the more rural provinces and other areas is much more limited and in general individuals tend to travel to larger provinces, rather than individuals in more urban provinces coming to rural provinces. Thus, the lack of movement between these provinces and areas where transmission is occurring could lead to later onset of sustained transmission locally and lower levels of transmission in the absence of more individuals entering the province and interacting with the local population.
Another explanation is that what we see here could be similar to that observed in the Netherlands in the early phases of the pandemic where the reproductive number was estimated to be below one, indicating that sustained transmission was not occurring and the cases were being generated through imported infections . In South Africa, this would imply that individuals traveled to larger urban centers and became infected there. Their case was reported upon returning home so that the case is not attributed to the location where the transmission event actually took place, at least for the initial cases. We did not account for the possibility of this taking place.
There has been significant work pointing to the great spatial heterogeneity that exists in influenza; however little work has been done to directly estimate the influence of local transmission in propagating this trend. Intensive network models have the capability of investigating these dynamics, but are challenging to implement without extensive resources. Our study introduces a novel and simple approach for doing this by making use of the epidemic curve, information on the serial interval distribution and some prior knowledge of transmission dynamics. We have shown results for estimation over the entire outbreak period, but the modification proposed by Cauchemez et al  allowing for real-time estimation of Rt could be implemented straightforwardly with this modification, as well.
Our results are suggestive of substantial spatial heterogeneity in transmission dynamics, however further study is warranted due to the limitations of the data at hand and uncertainties on reporting dynamics. At a minimum, these results should argue for modifications in the data that is collected from surveillance and other data collection systems to better understand reporting patterns and the dynamics of interaction between individuals that would lead to substantial heterogeneity in transmission. An improved understanding of heterogeneity will aid in targeting limited interventions in the most effective way possible.