Global spatiotemporal and genetic footprint of the H5N1 avian influenza virus
© Li et al.; licensee BioMed Central Ltd. 2014
Received: 23 December 2013
Accepted: 6 March 2014
Published: 21 May 2014
Skip to main content
© Li et al.; licensee BioMed Central Ltd. 2014
Received: 23 December 2013
Accepted: 6 March 2014
Published: 21 May 2014
Since 2005, the Qinghai-like lineage of the highly pathogenic avian influenza A virus H5N1 has rapidly spread westward to Europe, the Middle East and Africa, reaching a dominant level at a global scale in 2006.
Based on a combination of genetic sequence data and H5N1 outbreak information from 2005 to 2011, we use an interdisciplinary approach to improve our understanding of the transmission pattern of this particular clade 2.2, and present cartography of global spatiotemporal transmission footprints with genetic characteristics.
Four major viral transmission routes were derived with three sources— Russia, Mongolia, and the Middle East (Kuwait and Saudi Arabia)—in the three consecutive years 2005, 2006 and 2007. With spatiotemporal transmission along each route, genetic distances to isolate A/goose/Guangdong/1996 are becoming significantly larger, leading to a more challenging situation in certain regions like Korea, India, France, Germany, Nigeria and Sudan. Europe and India have had at least two incursions along multiple routes, causing a mixed virus situation. In addition, spatiotemporal distribution along the routes showed that 2007/2008 was a temporal separation point for the infection of different host species; specifically, wild birds were the main host in 2005–2007/2008 and poultry was responsible for the genetic mutation in 2009–2011. “Global-to-local” and “high-to-low latitude” transmission footprints have been observed.
Our results suggest that both wild birds and poultry play important roles in the transmission of the H5N1 virus clade, but with different spatial, temporal, and genetic dominance. These characteristics necessitate that special attention be paid to countries along the transmission routes.
The hemagglutinin (HA) segment of H5N1 highly pathogenic avian influenza (HPAI) virus has been reported to have 10 distinguished clades ranging from 0 to 9 . Among these, the Qinghai-like lineage (clade 2.2) was the quickest to evolve, with a fourth-order sub-clade up to the present. Moreover, of the many genetic clades circulating in Asia, clade 2.2 is considered the only one that spread westward to the Middle East, Europe and Africa , reaching a dominant role with global range.
Much previous research has been completed and is of considerable significance to our study. For instance, some research works studying the virus transmission at nationwide and continent-wide spatial scope have been conducted based upon phylogenetic evolution theory, especially during 2005–2006 when the situation was severe [3, 4]. Moreover, new interdisciplinary efforts using a combination of geospatial informatics and bioinformatics approaches have been made to improve understanding of global H5N1 transmission [5, 6]. However, some limitations of these works remain to be solved. First, most research has focused on certain regions or small spatial scales. For example, some works explore the transmission pattern only within a single country, such as India [7, 8], Egypt , Mongolia , the Russian Federation , France , Germany , Switzerland , Burkina Faso , Hungary , Bangladesh , and Kuwait . Other studies have remained focused within a specific continent, such as Africa [18, 19]. The study of cross-continent spread  is even less common, not to mention examination of disease diffusion at a global scale, which probes the transmission footprints among H5N1-infected countries globally. In addition, other works concerning clade 2.2 routes are relatively approximate, and appear to have simplified the transmission pattern and underestimated the complexity of this clade.
Given the significance of the Qinghai-like lineage and lack of knowledge about its probable means of transmission, there is a need to investigate its geographic spread mechanisms through investigating its underlying genetic forcing. In this research, we examine potential global transmission routes during 2005–2011, and the spatiotemporal distribution features of outbreaks along these routes. We also explore genetic mutation features relative to the isolate A/goose/Guangdong/1996 (Gs/Gd/96), to enhance our understanding of clade 2.2 from a novel, macroscopic view.
Global time-location series of H5N1 outbreaks, containing 11750 records from January 2005 through September 2011, were extracted from official reports of the Food and Agriculture Organization of the United Nations (http://www.fao.org). The associated variables and parameters of H5N1 outbreaks are outbreak location, observation and reporting date, and some descriptive information about the corresponding host. Incomplete records, especially those without location information, were labeled with Google Earth, and the center of the available administrative region where the outbreak occurred was used instead.
Outbreak data were used mainly to identify transmission direction, especially when temporal information was unavailable for genetic sequences. These sequence records were mainly from Egypt, Nigeria and Germany. The data were also used to analyze the spatial and temporal host distribution pattern along transmission routes.
GenBank of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov) has collected and published HA and neuraminidase (NA) sequence data of HPAI clades. From the database website, we obtained global HPAI H5N1 full-length sequence data from 2005 to 2011. This selected time period for the sequence data was in accordance with that of the H5N1 outbreak data. In addition to the basic sequence information, year and location of the isolates and host data were also collected. In practice, many genetic fragments were incomplete. Therefore, only those segments with lengths greater than 1600 base pairs were selected, for a total of 873 sequence data records of HA clade 2.2. Accession numbers of the nucleotide sequences were provided (Additional file 1). Among the 873 sequences of the HA genes, 83 are from humans, 580 are from poultry, and 210 are from wild birds. The number of sequences obtained and used for each country is summarized in the Additional file 2.
Using Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0 , the collected sequence data were initially aligned and compared. They were then combined with the sequences in a small tree from the World Health Organization (WHO) (http://www.who.int/en). The phylogenetic tree was constructed by both the maximum-likelihood and neighbor-joining distance-based matrix algorithms to ensure the identity of tree topology. For most sequences, spatial and temporal information was available on a country and year basis, respectively. Exceptions were some isolates in geographically large countries such as China and Russia, which had province-level information. In this way, source articles of all sequences were referred to and checked, to implement their detailed descriptions in the sequence database. With this dataset of improved accuracy, it was easier to match sequences with geospatial data based on their spatial and temporal attributes; nevertheless, some temporal information was unavailable and required supplementation by outbreak data in the following analysis. In addition, sequence data were used to calculate genetic distances to the Gs/Gd/96 isolate.
The terms route and path are used to illustrate the overall and detailed transmission pattern of virus, respectively. The former depicts the broad transmission mode and the latter gives further regional spread information.
According to the geographic locations of the studied countries in each group, 10 paths were integrated into four possible routes, based on the presumption that there may be certain similarities among genetic sequences in paths within the same countries.
Genetic distances to the Gs/Gd/96 isolate were calculated with the Kimura two-parameter model  in MEGA version 4.0, using a gamma distribution (shape parameter = 1) to model rate variation among sites. Results were stored in matrix form.
The average genetic distance to Gs/Gd/96 along the i th route, D i , represented the mean genetic mutation among all regions on the i th route. The average genetic distance to the Gs/Gd/96 isolate in the j th region on the i th route, D ij , indicated mean genetic mutation within the j th region on the i th route. If D ij > D i , the j th region was considered to have a positive role in genetic mutation on the i th route, or be a “key” region, and its geographic location was marked for further analysis of the spatiotemporal distribution of genetic mutation along routes.
Detailed transmission paths
05.2 Russia (Asian) → 05.11-06 Russia (European) → 05 Romania → 06.2 Hungary & 05 Croatia
05 Russia (European) → 06 Nigeria → 06 Sudan & 06–07 West Africa
05 Romania & 05 Turkey → 06.2 Austria & 06.2 Slovenia → 06 France
05 Romania & 05 Turkey → 06 Nigeria
06.5 Mongolia → 06 Korea → 07 Japan
06.5 Mongolia → 06.6 Russia → 07.4 Qinghai → 08 India
India -- Iran -- Russia -- Italy
06.2 Italy → 06.2-3 Slovenia & Austria & 06.2 Hungary & 06.2 Slovakia → 06.4 Czech → 06 Germany
07 Kuwait & 07 Saudi Arabia → 07-11Bangladesh → 08–10 India
07 Kuwait & 07 Saudi Arabia → 07 Germany → 07–08 other European countries
The first route is probably the most complex, and consists of four paths. This route began from the Asian part of Russia in 2005 and initially spread westward within that country, reaching its European part in late 2005. After this within-country spread, transmission split in two directions, westward and southwestward. The former trajectory reached France through the southern part of Austria, Slovenia, Croatia and Hungary in 2006. In the latter course, viruses from Romania and Turkey combined in 2005 and spread to Africa in 2006 or, more precisely, to Nigeria in February 2006. Transmission continued west and east, affecting western African nations and Sudan in the east. Thus, western Asia and eastern Europe may be sources for Africa.
The source of the second route is viruses isolated in Mongolia during May 2005, which then followed two distinct paths. One of these impacted Korea and Japan, and the other path spread westward to the Asian part of Russia, and then southward to Qinghai, China, and India over the two years, constructing links among Mongolian, Russian, Qinghai, and Indian viruses during May 2005 through 2008.
In the third course, the direction of transmission between countries in the first half is unclear, but shows potential relationships between viruses in southwest Asian and European countries. However, in the second half of this route, viruses originated in Italy in February 2006, affecting central and southern European countries such as Hungary, Slovakia, Slovenia and Austria almost simultaneously. Several months later, the Czech Republic and Germany were affected. Based on this transmission route, Italy may be an origin for those central and southern European countries.
The last footprint shows a distinct time and source feature relative to the aforementioned ones, i.e., many viruses in this pathway were isolated in 2007, with a source in southwest Asia (Kuwait and Saudi Arabia). This route has two sections; specifically, one spread east to Bangladesh and India and another transect crosses western Europe, affecting France, Austria, Germany, Poland, the Czech Republic, and the UK. Thus, southwest Asia may be a source for the viruses in Europe.
Based upon the description above, we found three main sources; namely, the Asian part of Russia, Mongolia, and southwest Asia (Kuwait and Saudi Arabia), which have clear spatial and temporal features. In particular, the first two were responsible for the 2005–2006 transmission and are in the northern part of Asia, whereas the third and fourth in southwest Asia were related to spread in 2007 and 2008. European countries and India had at least two virus incursions along multiple routes, producing a mixed situation.
The key regions of genetic mutation are illustrated in Figure 2. It is clear that nearly all are located at the end of each path, which means that with spatiotemporal transmission along the route, genetic distances relative to Gs/Gd/96 are becoming much greater. This indicates an ongoing evolution of viruses, causing a precarious situation in certain regions such as Korea, India, France, Germany, Nigeria, and Sudan.
We summarized four major transmission routes of the clade 2.2 virus with three sources; namely, Russia, Mongolia and the Middle East (Kuwait and Saudi Arabia). Based on the proposed criteria, we were able to establish clear transmission directions between countries on three routes, not just the similarities indicated by previous works such as Ducatez et al. .
The east-spreading section to Bangladesh and India in the last route may indicate that viruses in India during 2008 may be more precisely from Kuwait rather than Bangladesh, as previously assumed . In addition, European countries had at least two virus incursions with distinct spatiotemporal characteristics. Previous findings that suggest differences between viruses isolates before and after 2007 [13, 22–24], even in the same country, may confirm the present result that European viruses may have had two separate sources and routes in different years.
In the foregoing analysis, India has two different sources and Italy is recognized as one source for Europe. Nevertheless, given the uncertainty of transmission direction in the first half of the third route, there may be underlying relationships between southwest Asian and European countries. In this case, India may have been the source or pool of viruses in 2006, but strong evidence of this is unavailable. It is likely that one or more countries on the first half of this route may have acted as the source in a more exact sense. Thus, whether Italy should be treated as a source requires more consideration.
It was found that Egypt had only a single virus introduction and was declared endemic in 2008. The origin or source of these Egyptian isolates has not been identified, although some research has indicated that they are of Eurasian origin . The present focus is on transmission routes, therefore the sources of these isolates are important. Thus, the Egyptian isolates are not included in any path, but the lack of these isolates in our four routes does not mean that they are not crucial to maintenance and mutation of the virus.
As seen in Figure 3, isolates from along the four transmission routes are not completely mutually independent. There is some intersection, especially among viruses isolated in Nigeria, Turkey, Austria, Slovenia, and France in the first path, and those in European countries in the second half of the third route. Further, both routes eventually entered the European countries at nearly the same time. This spatial and temporal similarity between the two may be responsible for the intersection between isolates on the two paths.
There are some similarities between the transmission mode of the first clade 2.2 route and the global spread process of H5N1 outbreaks . That is, routes of clade 2.2 that spread westward to Europe and southward to Africa may be identical to outbreak trajectories in these two directions. This may be because that the west- and south-spreading viruses from eastern Europe during 2005–2008 all belong to the 2.2 clade, or that this clade is the only one that appears during this period and along those transmission directions.
As mentioned above, the first route covers the largest geographic area relative to the other three. This large spatial extent was generally facilitated by high infectivity and a long infectious period . In our study, chickens, ducks and geese were the primary victims and they had a wider spatial distribution than other host species, which indicates greater infectivity. Moreover, migration connects many bird populations in time and space along major flyways, either at common breeding areas during migration, or at stopover and wintering sites. This provides ideal places for many species to aggregate at high local density, causing cross-species infection. Also, virus-infected birds may transmit pathogens to other populations that may subsequently move to new areas . This forms a “transmission relay” with a long and continuous infectious period, which may reflect the importance of migratory birds in the perpetuation and spread of H5N1 viruses. The combination of experimental exposure data and telemetry information has been used in previous works to describe the intercontinental virus dispersion by a series of successively infected migratory birds . In addition, experimental studies have shown that several bird species survive infection and shed the H5N1 virus without apparent disease signs. Even worse, HPAI may retain high pathogenicity in chickens, in contrast to the lower pathogenicity found in infected ducks . This asymptomatic spread in birds and pathogenicity differences in hosts may partly explain the circulation of viruses and high death rate among chickens in Nigeria.The emergence timeline of clade 2.2 (Figure 5) shows only the emergence time of the clade in each country during 2005–2011, and is distinguished by poultry and wild birds. However, the number of outbreaks in each month is not shown. Therefore, the difference in seasonality of outbreak intensity between countries or hosts (poultry and wild birds) is not reflected.
Analysis of spatiotemporal distribution of various hosts along routes is done at the wild bird and poultry level. Moreover, roles are addressed in light of amounts and relative ratios of the two hosts in various spaces. These analyses have limitations. First, humans are not taken into consideration. H5N1 clade 2.2 human cases are common, especially in Egypt, and sporadic viruses have been isolated in Turkey, Azerbaijan, Iraq and other places. Despite barriers to transmission from donors to recipients, factors like human population growth, travel, and the legal and illegal poultry trade gradually weaken these geographic and behavioral barriers . This increases direct avian-to-human transmission with no recombination or apparent mutation , generating many human H5N1 cases such as in the severe situation of Hong Kong in 1997. However, this transmission does not mean that human-to-human transmission is also common. This transmission is likely limited to relatives, with no evident geographic clusters . The characteristics of not only donors and recipients, but also emerging viruses, should be considered regarding human transmission patterns . There has been no more detailed classification of host species, which may neglect the expanding host range of H5N1 viruses and newly infected host species. Pigs, for instance, are considered candidates for generating reassortant strains and a subsequent pandemic cause, because they have both avian and human influenza sialic acid residue receptors in their respiratory tracts . Given data imperfections at the host species level, we cannot address the issue of whether the host range has expanded and pig-infecting viruses of clade 2.2 have occurred. Finally, as mentioned by Andrew and Glass , new host populations in which the virus is endemic represent new risks of pandemic influenza in human beings, since the avian strain is present year-round and may coincide with any human influenza season. Based upon the above analysis, Egypt declared the virus endemic in 2008, after having had a large number of H5N1 clade 2.2 human cases during 2005–2011 (a new host population was isolated in 2006 for the first time). Therefore, we infer that there may be a new risk of pandemic influenza in humans in Egypt.
Owning to the under-reporting and general bias in the database considered, the four major global transmission routes presented here are possible scenarios based on evidence provided by our analysis. Thus, a well-established and shared database of H5N1 outbreaks and sequences is vital for further study and surveillance.
We located four major transmission routes of the clade 2.2 H5N1 virus with three sources; namely, Russia, Mongolia, and the Middle East (Kuwait and Saudi Arabia) in three consecutive years of 2005, 2006, and 2007. With spatiotemporal transmission footprints along each route, genetic distances to A/goose/Guangdong/1996 virus are becoming significantly larger, leading to a more challenging situation in certain regions such as Korea, India, France, Germany, Nigeria, and Sudan. Europe and India have had at least two incursions along multiple routes, causing a mixed virus situation. In addition, spatiotemporal distribution along the routes showed that 2007/2008 was a temporal separation point for the infection of different host species; specifically, wild birds were the main host from 2005 to 2007/2008 and poultry was responsible for the genetic mutation in 2009–2011. “Global-to-local” and “high-to-low latitude” transmission footprints have been observed. Thus, both wild birds and poultry play important roles in transmission of the virus, but with different spatial, temporal, and genetic dominance. Special attention should be paid to countries along these transmission routes.
This research was supported by the Ministry of Science and Technology, China, National Research Program (2010CB530300, 2012CB955501, 2012AA12A407), and the National Natural Science Foundation of China (41271099).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.