Ecological analysis of social risk factors for Rotavirus infections in Berlin, Germany, 2007–2009

Background Socioeconomic factors are increasingly recognised as related to health inequalities in Germany and are also identified as important contributing factors for an increased risk of acquiring infections. The aim of the present study was to describe in an ecological analysis the impact of different social factors on the risk of acquiring infectious diseases in an urban setting. The specific outcome of interest was the distribution of Rotavirus infections, which are a leading cause of acute gastroenteritis among infants and also a burden in the elderly in Germany. The results may help to generate more specific hypothesis for infectious disease transmission. Methods We analysed the spatial distribution of hospitalized patients with Rotavirus infections in Berlin, Germany. The association between the small area incidence and different socio-demographic and economic variables was investigated in order to identify spatial relations and risk factors. Our spatial analysis included 447 neighbourhood areas of similar population size in the city of Berlin. We included all laboratory-confirmed cases of patients hospitalized due to Rotavirus infections and notified between 01/01/2007 and 31/12/2009. We excluded travel-associated and nosocomial infections. A spatial Bayesian Poisson regression model was used for the statistical analysis of incidences at neighbourhood level in relation to socio-demographic variables. Results Altogether, 2,370 patients fulfilled the case definition. The disease mapping indicates a number of urban quarters to be highly affected by the disease. In the multivariable spatial regression model, two risk factors were identified for infants (<4 year olds): Rotavirus incidence increased by 4.95% for each additional percent of unemployed inhabitants in the neighbourhood (95% credibility interval (CI): 3.10%-6.74%) and by 0.53% for each additional percent of children attending day care in the neighbourhood (95% CI: 0.00%-1.06%). We found no evidence for an association with the proportion of foreign residents, population density, the residential quality of accommodations and resident changes in the neighbourhood. Conclusions Neighbourhoods with a high unemployment rate and high day care attendance rate appear to be particularly affected by Rotavirus in the population of Berlin. Public health promotion programs should be developed for the affected areas. Due to the ecological study-design, risk pathways on an individual patient level remain to be elucidated.


Rotavirus epidemiology
Rotavirus infections can be repeatedly acquired in persons from birth to old age, although infants in their first years are mostly affected [1]. The virus is highly infectious, and it can be assumed that after one year about 2/ 3 of the children have experienced at least one Rotavirus infection and 1/3 a second Rotavirus infection. After two years almost all children have undergone a Rotavirus infection, 2/3 a second and 1/3 a third Rotavirus infection [2]. The appearance of most of these infections is either asymptomatic or an episode of mild enteric symptoms, which an infected person may not notice. Despite this, a large number of infections lead to severe disease courses with diarrhoea, need for hospitalization and, in rare cases death [3]. There are considerable differences, however, in disease severity between primary and subsequent infections [2].
On a population-level Rotavirus infections are the leading cause of acute gastroenteritis among infants and young children in Germany [4], in Europe [5] and worldwide [6]. The global disease burden of Rotavirus infections for the health care systems can be regarded as high in developing countries, as well as in countries with more advanced economies. The overall worldwide annual mortality of Rotavirus infection is estimated approximately 440,000 deaths, mostly in infants. Rotavirus associated deaths occur almost exclusively in developing countries [7].
Family structures and social facilities such as day care centres for infants as well as nursing homes for the elderly can facilitate the transmission of the virus, and repeatedly lead to Rotavirus outbreaks. The community spread also seems to be influenced by the climate, leading to a characteristic seasonality with peaks in springtime [8,9]. In addition, nosocomial Rotavirus infections are also frequent [10].
In 2006, two Rotavirus vaccines (Rotarix and RotaTeq) were licensed and introduced in Germany [11,12]. Although currently under consideration by the German Standing Committee on Vaccination, the vaccines are not yet integrated in the routine childhood immunization schedule in Germany. Given the potential for a change in community spread due to expected differences in vaccine coverage in the population, there may thus be a prevalence of groups at special risk that are not currently in focus, but should be. Further understanding of risk factors for Rotavirus distribution on a population level is therefore needed in order to guide future decisions.

Social environments and health outcomes
The importance of links between the social environment and infectious diseases has long been of focus in the developing world, since these regions are burdened with issues that increase the burden of infectious diseases, such as poverty and suboptimal living conditions. Although less pronounced, issues like income inequality are important and relevant in Europe as well as in other advanced economic regions [13]. A part of these groups contain individuals from a marginalized and segregated subpopulation that may not be fully integrated into the general population. In Germany, where poverty and poor housing conditions are also important issues, health inequalities are on the rise due to changes in, e.g., economy and migration [14,15]. Recent position papers have urged the importance of identifying vulnerable groups on an ecological level [16] complementing strictly individual-based epidemiology on risk factors with population-based approaches, where socioeconomic factors are linked to health outcomes on a spatial aggregate level becomes important [17,18].
Due to its eventful history Berlin is a city with highly diverse communities in distinctive neighbourhoods, which makes it an ideal prototype to study these influences on a spatial level. In the future these neighbourhoods could form a new level of surveillance, mitigation and containment of infectious diseases in Berlin and elsewhere.
While people often seek out their place of residence based on their social and financial (socioeconomic) status, the social characteristics in those places of residence might reflexively influence people's social status [19]. These two forcesseeking out and being influenced by socioeconomic status might contribute to the formation of distinct living environments, which reflect the social diversity of societies especially in metropolitan areas. This can have an impact on the transmission of infectious diseases, which was shown for North American cities and which might also be true for European cities [20].

Spatial regression analysis of surveillance data
There has been some recent work in the literature that uses mathematical modeling to describe the dynamics of rotavirus at the population level [21]. Models are calibrated using time series data at the national or state level. Since the effect of socio-demographic variables is expected to happen at a much more detailed spatial scale, a more spatial oriented analysis approach is needed for such investigation. However, a good spatial resolution of health data is often in conflict with data privacy issues and small scale spatial analysis of infectious disease data is accordingly underdeveloped in practice. Another hindrance is that advanced statistical methodology is needed in order to properly take the spatial nature of such data into account: if one falsely relies on the common statistical assumption of observations being independent and identically distributed, one is overrating the value of information in the data and might obtain wrong conclusions in statistical significance tests. Thus, statistical methodology and corresponding software implementations should go beyond the independence assumption for such data. Spatial Bayesian regression models are the most common analysis tool in such situations [22]. These models have been successfully applied, e.g. for cancer incidence data [23], mortality data [24], but also for worldwide malaria incidence [25], nationwide surveillance data on Shiga toxinproducing Escherichia coli in Germany [26] and tuberculosis in urban setting in Brazil [27]. Recently, the integrated nested Laplace approximation (INLA) approach together with its implementation has provided a new and efficient method to perform statistical inference for spatial regression models including Gaussian random effects [28]. For example, the method has already been successfully applied for infectious disease surveillance data in veterinary medicine in Switzerland [29,30]. We employ this technique for the analysis of Rotavirus incidencesee methods section for details. Thus, our analysis covers all aspects of small area infectious disease epidemiology while simultaneously employing advanced statistical methods for the analysis.

Objective
The objective of this study was to describe the smallarea spatial distribution of the incidence of hospitalized Rotavirus cases in Berlin. A further aim was to explain parts of the spatial distribution pattern using socioeconomic and socio-demographic variables.

Descriptive analysis
More than one third (36.9%) of the analysed hospitalized Rotavirus cases were in the age group <1 year with an average yearly incidence of 969 in 100,000 population. There was a decreasing trend of incidence with increased age until adulthood with 2 cases in 100,000 (Table 1). This trend was reversed in the elderly (≥60 years) with an incidence of 14 cases in 100,000. The age group of ≥60 years shows the longest median duration of hospitalization. However, the size of the Interquartile range (IQR) as well as application of a Kruskal-Wallis rank test (p = 0.07) indicates that there is only weak evidence for differences between the lengths of hospitalization between the age groups. Overall, the sex distribution of cases was almost even. Furthermore, 15.3% of the hospitalized cases were reported as part of a larger outbreak, although there were large differences between age groups. Altogether, one fatal outcome in an infant less than one year old was reported.

Disease mapping
Visually, the age standardized absolute excess of hospitalized Rotavirus cases in Berlin were spatially unevenly distributed and show some degree of spatial association ( Figure 1). There was a concentration of neighbourhoods with high values in the southeast and most parts of the central east and northeast. Here, most spatial units were in the two upper quintiles. Some neighbourhoods in the northwest and southwest also show high disease burden as well as a much more heterogeneous distribution. Most neighbourhoods in the central parts of Berlin were in the lowest quintile.

Regression analyses
Regression analysis were performed separately for infant (<4 years) and elderly (≥60 years) as these are the two age groups particularly at risk. The results of the univariable spatial regression analyses of the infant model (<4 years) showed that, as expected, the age categoryvariable exhibits the by far lowest excess risk ratios ( Table 2). Ecological variables where the credibility intervals for the relative risk did not contain the one were: unemployment and day care attendance rate. Low influence was detected for the variables migration volume, population density, residential quality and foreign residents. This multivariable model identified two risks for the occurrence of hospitalized Rotavirus cases: incidence increased by 4.95% for each percent increase in unemployed inhabitants in the neighbourhood (95% credibility interval (CI): 3.10%-6.74%). Furthermore, incidence increased by 0.53% for each percent increase of children attending day care (95% CI: 0.00%-1.06%). The other variables in the multivariable model had effects where the credibility intervals for the relative risk contained the one, i.e. given the available data the influence of these variables cannot be distinguished from chance effects when adjusting for all other variables.
The results for the univariable and multivariable regression in elderly (≥60 years) are shown in Table 3. The only confirmed risk factor was population density of the neighbourhoods. In contrast the proportion of infants (<4 years) in the population of the neighbourhoods was not a risk factor for hospitalized Rotavirus infections in the elderly. However, it has to be noted that for the elderly, 64% of the neighbourhoods have zero cases, 21% have one case and the remaining have up to a maximum of 19 cases. As a consequence, results are more variable than in the infant model.

Analysis of spatial effects
The plotting of the spatial effects of the neighbourhoods reveals residual variation in the relative risk after accounting for measured covariates. Figure 2 shows the combination of structured and unstructured spatial effects on a relative risk scale, i.e. exp(ψ i+ υ i ). Neighbourhoods with negative residual values are in central and western parts of Berlin. The areas with negative residual values cover a crescent-shaped area around the city centre to the north, east and south. These could be caused by missing explanatory variables, which exhibit their own spatial structure, i.e. mean education level in the neighbourhood or can be caused by endogenous effects on the incidence of hospitalized Rotavirus cases within the city of Berlin, e.g. due to different diagnostic procedures in hospitals. Alternatively this reflects regional differences in disease prevalence (i.e.: small pockets of exceptional higher and lower vaccination coverage, respectively). A plot of the unstructured spatial heterogeneity due to the 12 health districts, i.e. exp(α j(i) ), in the infant model is shown in Figure 3. The results can be explained by health-district specific reporting artefacts or different regional awareness.

Key findings
We identified associations between socioeconomic factors as risk factors and the incidence of severe Rotavirus infections leading to hospitalization in an urban setting  in Germany. In addition, it is the first time that sociodemographic and socio-economic factors are analysed to explain variations of infectious diseases in Berlin at such a detailed spatial scale. This study suggests that socioeconomic risk factors may affect the distribution of infectious diseases in Berlin, and that spatially detailed information about infectious disease cases can provide more specific information about these effects. Both the occurrence and perhaps the severity of disease could also be explained by spatial variations in vaccination coverage. WHO recommends that a surveillance system for severe Rotavirus infections should be in place to monitor the effect of universal childhood rotavirus vaccination once implemented [31]. The German surveillance system is recognized as being able to monitor the impact of the vaccines [4]. The expected changes in burden of disease could be monitored more specifically if information on socioeconomic determinants for specific areas and subgroups would be assignable and accessible for analysis. Unfortunately, data on vaccination coverage with the required spatial precision as in our study is not available. In our study, unemployment is identified as a strong risk factor for disease. The investigation of the underlying causality of this socioeconomic factor requires the setup of appropriate analytical strategies involving multiple interrelated variables [17,18]. These include proximal biologic precursors (e.g. nutrition) and more distal and contextual risk factors like behaviour or unemployment as psychological burden. Unemployment in families is related to child poverty and maybe connected to poor access to medical care (i.e. Rotavirus vaccination is not generally covered by health insurance programs). Alternatively, unemployment could act as a cofactor for educational background or different risk behaviour regarding hygiene and oral rehydration.

Limitations and strength
The interpretation of the results is limited by the ecological design of the study. The statistical association of unemployment can be explained by the influence of the social situation of the neighbourhoods on the frequency and severity of the disease in all children in the neighbourhoods with high levels of unemployment. This would be a  true ecological effect. Alternatively, the effect on the frequency and severity of the disease was observed in the underprivileged sub-group only. Furthermore, it is conceivable that the likelihood of hospitalization is increased in children of families burdened by unemployment, due to differential access to hospital services or, in cases of gastroenteritis, to different routines for referral by general practitioners. Although an analysis with non-hospitalized Rotavirus cases included shows similar associations (data not shown). Similarly, the association of day care attendance rate can be explained as an increased risk in all children attending day care of the respective neighbourhood or only fraction attending day care in the respective neighbourhood.
Altogether, the issue of bias remains a major limitation of any ecological analysis such as our [32]. As a  Figure 2 Relative risk due to combined structured spatial effects ψ i and unstructured spatial heterogeneity υ i from the final multivariable regression model in 447 neighbourhoods in Berlin, Germany. In quintiles.  Figure 3 Relative risk due to unstructured spatial heterogeneity α j(i) in the final multivariable regression model in 12 health districts in Berlin, Germany.
consequence, the results of our analysis should be interpreted with carenot only because of possible ecological bias, but also due to possible confounding due to location, i.e. changes in covariate effects due to the addition of a location based structured random variable in the model [33].
However, despite its limitations ecological analyses remain an important hypothesis generating tool where already readily available registry data can be used to provide first answers to questions of public health importance. A confirmation of the results by an individual-level based analyses, e.g. through a case-control study, would be the natural next step.
Reporting of laboratory-notified Rotavirus infections in Berlin is mandatory with continuous case definition; however, eligibility for testing is not defined and therefore case ascertainment could vary inside the study area. Under the German reimbursement system for hospitalization charges, hospitals receive higher payments for cases of acute gastrointestinal enteritis with a confirmed infectious agent and therefore have a high incentive for the commission of Rotavirus diagnostic. Further steps in quantifying possible consequences of reporting artefacts could be to perform a joint modelling together with another gastrointestinal disease such as, e.g. Norovirus, as done in Held et al. [34]. However, we believe that our analysis restricted to hospitalized cases only provides results that reduce ascertainment bias.
The results of our study emphasize the relevance of spatial regression models including spatially structured effects and unstructured spatial heterogeneity for the detection of risk factors.

Conclusions
Appropriate targeting of public health promotion programs could be achieved by combining the criteria of neighbourhood with the risk groups identified in this study. The influence of the day care attendance rate indicates that high requirements for hygiene in child day-care facilities should be regularly checked and routinely maintained. Due to different policies in Germany, the attendance rates for day care centres of infants are unequal. Since 2008, a federal law supports the expansion of day care centres, and realizes the legal claim of all parents throughout Germany to a place their child in a day care centre in 2013 [35]. Thus, this difference is of future relevance for public health services in regions in West-Berlin and Western Germany with expected increase of children attending day care in the next years.
Subsequent small-area studies of Rotavirus or other infectious diseases could provide further insight into spatial effects on disease risk. Individual based risk factors have to be refined and integrated in the future.

Case data
Rotavirus cases were notified by laboratories to the 12 local public health offices in Berlin, anonymised and transmitted via the State Office of Health and Social Affairs, Berlin, to the Robert Koch Institute [36]. We restricted our analysis to patients who were hospitalized in order to avoid case ascertainment bias, since eligibility for a laboratory diagnostic test is unclear in outpatients and may have led to differences in notification rates in Berlin neighbourhoods (Figure 4). The case definition included all clinical and laboratory-confirmed patients with Rotavirus infection, which were hospitalized due to Rotavirus between 01 January 2007 and 31 December 2009 with residence in Berlin (n = 3270). Patients having no date or a date of hospitalization prior to the date of onset of illness were excluded as definite or probable nosocomial infections (n = 846). Furthermore, we excluded 14 travel-related patients as well as 40 with unknown residence. All in all, 2370 hospitalized Rotavirus cases were analysed in this study. Transmitted data included information on age, sex, laboratory results, onset of illness, start and end date of hospitalization.

Geo-referencing and disease mapping
Case location is given in a geographical reference system used for the urban infrastructure in Berlin [37,38]. This consists of 447 spatial units with a median population of 6631 inhabitants per unit (Interquartile range (IQR): 3960 -9974). We refer to these units as neighbourhoods.
Descriptions of the polygons for mapping were obtained from the Statistical Bureau Berlin-Brandenburg. The Rotavirus cases were geo-referenced to their neighbourhoods at the local health departments according to their home addresses. We estimated the expected number of cases e i in each neighbourhood i that would have been observed if the incidence of disease had had the same age structure as the incidence in the whole study area: where y i,a is the observed number of cases of the disease in the age strata a of the i'th neighbourhood and n i,a is the number of residents in the strata a of the i'th neighbourhood. In our case, a denotes the 13 age groups (<1 year, 1-< 2 years, 2-< 3 years, 3-< 4 years, 4-< 6 years, 6-< 10 years, 10-< 20 years, 20-< 30 years, 30-< 40 years, 40-< 50 years, 50-< 60 years, 60-< 70 years, ≥70 years).
For the comparison of expected and observed counts we produced values for the absolute disease excess (z i ) in a neighbourhood i as z i = y i -e i , see Figure 1. We choose to illustrate absolute excess in this descriptive figure, since absolute excess immediately contains information about the relevance of the differences (as opposite to relative risk), which is favourable when communicating results on a purely descriptive basis. Note that the subsequent modelling happens on a relative risk scale.

Spatial Bayesian Poisson regression model
When mapping the incidence of a rare and noninfectious disease, a common model is the spatial Poisson model, where the number of disease cases per spatial unit is assumed to follow a Poisson distribution with expectation λ i . The unadjusted number of hospitalized Rotavirus cases in our study, however, exhibited extra-Poisson distribution and had an asymmetrical statistical distribution ( Figure 5). One of the explanations for this effect is the infectious character of rotavirus which can cause clustering. To address this infectious disease aspect we include a spatial unstructured random effect ψ ie N 0; σ 2 ð Þ to allow for extra-Poisson variation while staying within a Poisson likelihood framework. Furthermore, a spatial structured effect υ i was incorporated for each of the 447 neighbourhoods using a Gaussian Markov random field in order to address spatial dependence between the neighbourhoods [39]. Finally, a hierarchical unstructured random effect was included for each of the 12 health districts that each neighbourhood i was located in, i.e.α j i ð Þ e N 0; τ 2 ð Þ, which accounts for reporting artefacts at local health district level and thus also allows for a hypothesized east-west difference originating from pre-reunification times.
Independent variables include factors, which were presumably related to spatial variation of disease risk in Berlin and are incorporated as fixed effects inside the model. A full description of the explanatory variables is given in table 4 and maps illustrating their spatial distribution in the study area are shown in figure 6.
Altogether, the total number of cases for age group a in neighbourhood i is in our spatial regression model given by: Here, n i,a denotes the population in neighbourhood i in age group a and z i denotes the vector of explanatory variables which contains the age groups as factor variable. The intercept μ was the estimated value when all predictors were zero (continuous covariates) or at their baseline values (discrete covariates), respectively. For the Bayesian inference, we took the integrated nested Laplace approximation (INLA) approach as introduced by [28] and implemented in the R package R-INLA [40,41]. Altogether, our statistical approach resembles the one used before by Schrödle et al. [29,30]. Deviance information criterion (DIC) [42] was used as a measure for comparing Bayesian models, since it provides a trade-off between model fit and model complexity. Thus, we searched for the model with the lowest DIC.
Because we assumed different risk profiles for infants and elderly, the regression analysis was performed in two separate models: One for infants, i.e. the <4 year old containing four age groups (<1, 1-< 2, 2-< 3 and 3-< 4 years) and one for elderly, i.e. the ≥60-years-old containing a single age group. In the infant model the age group specific population was included as an offset variable to control for the strong effect of age in the infant model. Following the suggestions in Rothman et al. [43] Figure 6 (See legend on next page.) we decided to start the investigation by performing a univariable spatial modelling of all potential risk factors adjusting only for the strong effect of age using the above model. The actual underlying multiple-inference question of identifying relevant risk factors while adjusting for the joint conglomerate of risk factors was addressed by fitting a single multivariable model containing all potential risk factors. Thus, we abandon any type of step-wise model selection, which is known to have low efficiency and poor power. Based on DIC, a health-district effect was only included in the infant model. We used Bayesian inference and report the resulting excess risk ratios as point estimate (posterior mean) and 95% credibility intervals as a quantification of parameter uncertainty. Residual relative risk of the combined structured and unstructured spatial effects, i.e. the posterior median of exp(ψ i + υ i ), and of the 12 health districts, i.e. exp(α j(i) ), for the final multivariable regression model of the infants <4 years were mapped. In a sensitivity analysis (not-shown), possible non-linearity of the covariates was investigated; see Natário and Knorr-Held [44] for a discussion of the method and the consequences of ignoring non-linearity. However, our analyses showed that linear effects of the covariates were adequate for our purposes.
All computations and map visualizations were done in R v. 2.15.0 [41].
(See figure on previous page.) Figure 6 Spatial distribution of the six covariates in the 447 neighbourhoods of Berlin. For daycare attendance the proportion of <4 year olds which attend daycare is mapped. Note that in the regression modeling daycare attendance enters as the proportion of kids in the corresponding age group (<1, 1-< 2, 2-< 3, 3-< 4).