A multilevel non-hierarchical study of birth weight and socioeconomic status

Background It is unclear whether the socioeconomic status (SES) of the community of residence has a substantial association with infant birth weight. We used multilevel models to examine associations of birth weight with family- and community-level SES in the Cape Cod Family Health Study. Data were collected retrospectively on births to women between 1969 and 1983 living on Cape Cod, Massachusetts. The sample included siblings born in different residences with differing community-level SES. Methods We used cross-classified models to account for multiple levels of correlation in a non-hierarchical data structure. We accounted for clustering at family- and community-levels. Models included extensive individual- and family-level covariates. SES variables of interest were maternal education; paternal occupation; percent adults living in poverty; percent adults with a four year college degree; community mean family income; and percent adult unemployment. Results Residual correlation was detected at the family- but not the community-level. Substantial effects sizes were observed for family-level SES while smaller magnitudes were observed for community-level SES. Overall, higher SES corresponded to increased birth weight though neither family- nor community-level variables had significant associations with the outcome. In a model applied to a reduced sample that included a single child per family, enforcing a hierarchical data structure, paternal occupation was found to have a significant association with birth weight (p = 0.033). Larger effect sizes for community SES appeared in models applied to the full sample that contained limited covariates, such as those typically found on birth certificates. Conclusions Cross-classified models allowed us to include more than one child per family even when families moved between births. There was evidence of mild associations between family SES and birth weight. Stronger associations between paternal occupation and birth weight were observed in models applied to reduced samples with hierarchical data structures, illustrating consequences of excluding observations from the cross-classified analysis. Models with limited covariates showed associations of birth weight with community SES. In models adjusting for a complete set of individual- and family-level covariates, community SES was not as important.


Background
Birth weight is a measure of overall infant health and a predictor of death and long-term disability [1]. Previous research has shown that family socioeconomic status (SES) is positively associated with birth weight; [2,3] however after adjusting for this and other known predictors, substantial variability in birth weight remained. Increased rates of low birth weight (< 2500 g) have been observed in impoverished areas [4][5][6] leading to the hypothesis that community-level socioeconomic factors also influence birth weight [7,8]. Contextual effects on health by community of residence may be mediated by health service availability, shared attitudes towards health care, and sources of chronic stressors and social supports [6]. To understand these complex relationships, previous studies examined associations of birth weight with community-level SES measures such as unemployment rate, per capita or family income, percent adults with lower levels of education, and percent adults living in poverty [3,4,8].
Researchers turned to multilevel models to examine the impact of family-and community-level SES on birth weight while accounting for residual community level correlation between subjects [3,4,9,10]. In these studies families were "nested" within neighborhoods and samples included one child per family. Two levels were examined: the child-/family-level, including child and family characteristics, and the community-level, including community SES. Previous literature measured varied magnitudes of association, and variability in child-and family-level confounders in multivariate analyses led to inconsistent conclusions. Studies finding effects of community SES tended to adjust for limited covariates available from birth certificate or registry reviews [3,4,8]. Noticeably absent from these analyses were adjustments for gestational duration and maternal smoking status during pregnancy. Morenoff et al. found that the effects of community variables observed in crude analyses were reduced when adjusting for an extensive list of child and family characteristics, [9] suggesting that observed community-level effects may be a consequence of unmeasured confounding.
Traditional multilevel models were limited by the requirement of a single child per household. If siblings born in the same community were included in the sample a third level would be introduced to the multilevel analysis: the child-level, family-level, and community-level. The model would assume children were nested within families and families were nested within communities [11]. Models of this structure are straightforward to consider and can be applied in a number of software programs.
However, it is not reasonable to assume that the family lived in the same community for all births. Relocation is relatively common as 47% of United States residents reported relocation between 1965 and 1970 [12] and, in 1988, 74% of a nationally representative sample of children in the United States had moved residential location at least once in their lifetime [13]. Different family residences may lead to sibling differences in community SES and other characteristics at birth. Nested hierarchical models cannot adequately represent the study population. While they are uncommon in current birth weight research, cross-classified models are useful in analyzing data with non-hierarchical structures [14]. Cross-classified multilevel statistical models should be used to consider families that may have moved from one community to another between child births. These models allow for correlation between siblings as well as between children born in the same community. Cross-classified models can be expressed similarly to hierarchical models in mathematical notation [14] and can be analyzed through available statistical software such as SAS [15].
The purpose of this paper is three-fold. The first objective is to evaluate the importance of family-and commu-nity-level measures of SES in predicting birth weight in the Cape Cod Family Health Study, a retrospective cohort study conducted on Cape Cod, Massachusetts. This study provides an example of a non-hierarchical data structure with siblings born in different communities. To the best of our knowledge, no previous research has examined the multilevel association between family-and communitylevel SES and birth weight allowing for a non-nested data structure. The second objective is instructive as we illustrate how such non-hierarchical data can be appropriately analyzed and give recommendations for the analysis of multilevel data in general. As we collected extensive individual-and family-level information about our study participants, our third objective is to compare the results of multilevel models using full confounder information with those of models including limited data typically found in birth records. From these analyses we are able to assess confounding of birth weight associations with SES in our study population.

Study Population
The Cape Cod Family Health Study is a retrospective cohort study designed to investigate the relationship between environmental PCE (perchloroethylene, tetrachloroethylene) exposure from contaminated drinking water and reproductive health outcomes on Cape Cod, Massachusetts. Children were eligible for the study if they were born between 1969 and 1983 to a mother living in one of eight Cape Cod towns [16]. We restricted the present analyses to full term infants (gestation ≥ 37 weeks) to eliminate the impact of prematurity on birth weight. The Institutional Review Boards at Boston University Medical Center and the Massachusetts Department of Public Health approved this research.
These data provided an opportunity to examine predictors of birth outcomes at child-, family-, and communitylevels. Current analyses focused on birth weight, obtained from birth certificates. We selected continuous birth weight as it provides an opportunity to apply crossclassified models to a continuous outcome. Child-and family-level data were ascertained through birth certificate review and self-administered questionnaires sent to the mothers in 2002-2003. For community-level variables, study mothers' residential addresses at time of birth were linked to census data from 1980. These data were closest to the birth years of children in the study: 89% were born between 1974 and 1983. Community-level data was purchased from GeoLytics (East Brunswick, NJ) at the enumeration district level. Enumeration districts are roughly the same size as census tracts and were used to designate communities as Cape Cod was not divided into census tracts until after 1980 [17].
Predictors of interest included family-and communitylevel SES variables. At the family-level, SES was measured by maternal education (less than high school, high school graduate, or some college) and paternal occupation (white collar, blue collar, or other) at time of birth. Community-level SES measures included percent adults living in poverty, mean family income, percent adults graduated from a four-year college, and percent unemployment in the enumeration district of residence. Mean family income and percent adults graduated from a four-year college were parameterized by quartiles while percent adults living in poverty and percent unemployment were dichotomized at 0 and 10%, respectively, based on evidence of a clear cut point from an unadjusted locallyweighted running-line smoothing (LOESS) plot [18].
Other confounding variables were selected based on known relevance to birth weight. From birth certificate review we included child gender, gestational duration (> = 37 weeks), birth order, year of birth, maternal age, and adequacy of prenatal care. From survey questionnaires we obtained information on the adequacy of maternal weight gain, maternal history of cervical incompetence, and smoking and alcohol consumption during pregnancy. Family-level covariates included maternal race from birth certificate review, maternal history of diabetes or hypertension before or during pregnancy and history of prior low birth weight (birth weight < 2500 g) or premature (gestation < 37 weeks) infants from the questionnaire.
Differences between family-level SES measures could occur between siblings due to changes in paternal occupation, maternal education, or paternity. Differences between community-level SES measures could occur due to family relocation between child births.

Statistical Analysis
Smooth LOESS plots [18] were used to examine unadjusted relationships between continuous group-level SES measures and birth weight. Cut points identified in these plots were used to create categories for analyses.
We expected residual correlation within families and enumeration districts as siblings likely have correlated birth weights after adjusting for other predictors due to genetic similarities. Children within enumeration districts may also have comparable birth weights due to similar parental behaviors or environmental exposures. We used a mixed model with random intercepts for family and enumeration district to account for these correlations.
The mixed model was non-hierarchical due to crossclassification of sibling births between enumeration districts. The applied model was a mixed linear regression model as follows: Y ijk represented the birth weight of the k th child from the j th family living in the i th enumeration district. The term α represented the intercept, X k the child-and family-specific demographic characteristics, and W i and V j represented the community-and family-level SES values, respectively. The corresponding regression coefficients were β, λ, and γ. d i and f j were random intercepts for enumeration district and family [14]. Formally, this allowed correlation within families and enumeration districts, resulting in variation in mean birth weight at two levels.
For simplicity, the model ignored any possible spatial correlation due to proximity of enumeration districts.
Three mixed linear regression models were compared to evaluate the necessity of random intercepts. Model I included random intercepts for enumeration district only, Model II had random intercepts for family only and Model III allowed for variation between both family and enumeration district means. It was assumed that family and enumeration district effects were distinct and the variances of each level could be identified. (Table 1) Hypothesis tests were performed for each of the models to evaluate whether the variance of the random intercepts were non-zero. Akaike's Information Criteria (AIC), a measure of model fit, was compared [19].
We applied two additional cross-classified models to evaluate possible confounding of family-and community-level SES measures. Model IIIa excluded family-level SES variables while Model IIIb excluded community-level SES variables. (Table 1) Two models were performed using restricted samples to impose a hierarchical data structure and avoid the need for cross-classified models. For Model IV, the sample was restricted to the eldest child per family. Random intercepts for enumeration district were included. Model V allowed multiple children per family but required that all children be born in the same enumeration district creating a nested hierarchy. In this sub-sample, later-born children were considered eligible if they were born in the same enumeration district as the eldest child. Random intercepts were included for both family and enumeration district. Predictors of primary interest were family-and community-level SES measures. All analyses were adjusted for relevant covariates. We tested for confounding by PCE exposure but found no association with the outcome. As a result, the models presented do not control for PCE exposure.
We analyzed a birth certificate covariates only model (Model VI) to test for association between communitylevel SES and infant birth weight adjusting only for variables selected from birth certificate review. Variables reflected those used in the previous research that found significant associations between community SES and birth outcomes: child birth order, gender, birth year, adequate prenatal care, maternal age, maternal race, and maternal education [3,4,8]. To evaluate confounding, Model VI was compared to two models including either gestational duration (Model VIa) or maternal smoking status during pregnancy (Model VIb) in addition to the variables listed above. We chose these comparisons as they are known important risk factors for low birth weight and gestation was often excluded from prior stud-ies [3,4,8] and, at the time these data were collected, maternal smoking status during pregnancy was not available from Massachusetts birth certificates [20].
All analyses were performed using SAS Version 9.1 [15]. (Example SAS code to apply cross-classified models is available in Additional file 1.) Beta coefficients were used to assess the magnitude of the effect of SES on birth weight and p-values were used to assess the precision of the coefficients. All hypothesis tests were performed with a significance level of 0.05.

Results
The Cape Cod Family Health Study consisted of 2,144 subjects. Subjects included in this analysis were restricted to full term singletons born after 1975 without congenital anomalies whose mothers responded to the self-administered questionnaire. (n = 1,689) Subjects with at least one missing covariate were excluded from analyses. (n = 240) No single covariate had more than 5% missing values. The final sample included 1,449 children from 1,252 families living in one of 170 enumeration districts. Of those families, 175 had between two and four children in the study and 34 families moved between enumeration districts at least once between births. On average, enumeration districts included 9 children, with a range of 1 to 79. There was evidence of variation in birth weight: the sample mean was 3524.0 g and the standard deviation was 466.4 g. The standard deviation of family mean birth weight was 460.0 g while the standard deviation of mean birth weight in enumeration districts was smaller (247.8 g). This is a crude illustration that enumeration districts accounted for less variation in birth weight than families. Infants were nearly 50% female with an average gestational duration of 40.2 weeks. Fewer than 5% of mothers had a history of giving birth to a low birth weight infant, 8% had a history of hypertension, and over 95% of mothers were white. Over one-fourth of mothers smoked during pregnancy and nearly 40% consumed alcoholic beverages, both behaviors have been shown to decrease birth weight [21,22] ( Table 2).
Over 60% of mothers had at least a high school degree and half of fathers had white collar occupations. (Table 2) 14% of enumeration districts had some adults living in poverty. The mean enumeration district family income was over $19,500 with a standard deviation of over $6,000. An enumeration district-level mean of 14% of adults had a four-year college degree with an average of 8% unemployed. (Table 3) Of families moving from one enumeration district to another, 7% of mothers increased their education between births and 30% of fathers changed occupation categories. For community-level SES, 13% moved to areas with some adults living in poverty and 23% moved from these communities; 76% changed mean family income quartile, and 80% changed percent of adults with a college degree. 17% of families moved into enumeration districts with more than 10% adult unemployment and 13% moved away from these areas.

Random Effects
In Model III, the residual correlation was 0.5507 for siblings born in the same enumeration district, 0.5504 for siblings born in different enumeration districts, and 0.0016 for unrelated subjects born in the same enumeration district. Unrelated subjects born in different enumeration districts were assumed to be independent.
Based on tests of non-zero variance for random intercepts, the best fit to the data was observed in Model I, a conclusion supported by comparison of AIC statistics. In circumstances where a random effect is not statistically significant, researchers typically select the most parsimonious model for further analyses. Here there was no difference in interpretation as residual correlation within enumeration districts was near zero. For instructive pur- poses we interpreted Model III, the cross-classified model including random intercepts for family and enumeration district.

Family-and Community-Level SES
In Model III, mothers with some college education gave birth to infants weighing 128 g more than mothers with less than a high school diploma while smaller effect sizes were observed for other SES variables. (Table 5) To understand whether the family-and communitylevel SES measures were independent of one another, we excluded the family-level SES measures from the analysis (Model IIIa). The estimates for some community-level SES measures changed by more than 10% suggesting confounding of the community-level effects by the two family-level measures; however the magnitudes of the effects remained small. When the community-level SES variables were excluded from the model (Model IIIb) there were small change to the coefficients of the family-level SES measures indicating little confounding by the community-level measures. (Table 5

Comparison to Sub-Samples with Nested Hierarchies
Model IV was applied to the eldest eligible child per family, excluding 262 later-born children from the analysis. Model V was applied to eligible eldest children and laterborn children born in the same enumeration district as the eldest child. (n = 1,399, 50 subjects excluded) Increased maternal education effect sizes were observed for Models IV and V when compared to Model III. Most notably, the effect size was increased considerably for the comparison of women without a high school diploma to those who have attended some college for the models applied to sub-samples. In Model IV, paternal occupation was significantly associated with birth weight. (p = 0.033) Fathers with blue collar occupations had children with significantly lower birth weights than white collar occupations (p = 0.024) while no difference was detected between white and "other" occupations. (p = 0.651) Paternal occupation in Model V had a p-value just over the 5% cut-off, (p = 0.057) indicating a possible trend in the same direction as Model IV. Similar effect sizes were observed for all community-level SES variables in Models III, IV, and V. (Table 6)

Birth Certificate Covariate Only Models
In Model VI, containing child gender, birth order, year of birth, adequacy of prenatal care, maternal age, and maternal race in addition to family-and community-level SES measures, percent of adults with a four year college education and maternal educational level had the strongest associations with birth weight. (Table 7) Inclusion of gestational duration, a confounder even after restricting the sample to full term infants, produced large changes in the coefficients for group-level SES. Similar results were observed when we added maternal smoking status during pregnancy to the model. (Table 7) In none of these models was it important to account for residual correlation at the community-level.

Discussion
A positive trend of increased child birth weight for maternal completion of higher education was observed, consistent with previous research associating maternal education with birth outcomes [3,4]. For models applied to the full sample, paternal occupation did not have a significant association with birth weight. A large variety of positions held at the military base on Cape Cod, full time students, and other occupations were classified as "other", complicating the interpretation of this variable. Community-level SES showed little association with birth weight in the full model. This suggests little contextual effect of SES in this study population.
Strengths of this study include the relatively large sample size and collection of birth weight data from birth certificates and confounding variables from birth records and parental report. Due to the retrospective study design, subjects may have made errors in recalling and reporting past behaviors. While this may have affected responses to risk factors such as smoking and alcohol consumption during pregnancy, community-SES measures were obtained from 1980 census information and both birth weight and family-SES measures were obtained from birth certificates. The effects of community-and family-SES measures on birth weight should be unaffected by recall errors.
Community SES measures were based on enumeration district of residence at approximate time of child birth. The boundaries of enumeration districts may not adequately reflect groupings of families with latent similarities. Smaller geographic resolution may produce groups of families that were more similar than those available from enumeration districts. Although we previously found contextual effects of SES on risk of breast cancer on Cape Cod, [17] the area may be too homogenous to show strong effects of community SES on birth weight. Further studies in more diverse geographical areas are needed. SES measures selected for this analysis may not adequately capture SES at the community-level; however there are no standard methods for community SES assessment.
A study examining birth outcomes in the Chicago area in the mid-1990s where communities were defined as one or more geographically contiguous census tracts also found minimal community-level SES effects on birth weight after adjustment for individual-and family-level  characteristics, including maternal education, marital status, and smoking and alcohol consumption during pregnancy as well as biomedical characteristics such as anemia, diabetes, previous low birth weight infants, and previous pregnancy terminations. Community SES measures such as percent of families in poverty and residential stability (a composite measure of owner occupied housing and residents living in a home for extended periods of time) had coefficient magnitudes of only 3.15 and 3.29 g per one unit increase in the corresponding variables when predicting birth weight. In analyses adjusted for fewer covariates these variables had coefficient magnitudes greater than 20 and 10 g per one unit increase, respectively. Interestingly, community-level variables capturing stress in the neighborhood, violent crime rate and reciprocated mutual support were more important predictors than percent of families living in poverty or residential stability with coefficient magnitudes greater than 10 g per unit increase in a model with limited covariates and magnitudes greater than 6 g per unit increase in a full model [9]. When predicting very low birth weight (< 1500 g) among U.S. born non-Hispanic white and African American women who delivered in New York City, there were no meaningful effects of community poverty level but individual-level factors were influential. There was an important effect of community poverty when predicting moderately low birth weight (1500-2500 g) [10]. Luo et al. found that community-level SES measures had only moderate association with birth outcomes (preterm birth, small for gestational duration, stillbirth, neo-and post-natal deaths) while individual SES factors had far greater effects [4]. Other studies found independent effects of community SES and individual-factors on child birth weight [3,8]. When stratified by ethnicity, the directions and magnitudes of association varied considerably between groups [8].
When we compared the individual-level factors included in previous studies, we found that studies with moderate to large associations between community-level SES and birth outcomes adjusted for a small number of individual-level covariates. These papers focused on community variables and did not control for known risk  factors and possible mediators such as gestational duration and smoking status during pregnancy [3,4,8]. We analyzed comparable models including birth certificate covariates only and found similar results. Prior to adjusting for gestational duration or maternal smoking status, community SES had a significant association with birth weight (p = 0.019). Once adjusted for either one of these additional confounders, the magnitudes of coefficients decreased for two of three categories and there was no longer evidence of an association. Morenoff et al. also found reduced effects of community SES measures predicting low birth weight when adjusting for more individual characteristics than are available in birth records [9]. Rauh et al. found no meaningful association between community SES and very low birth weight when controlling for maternal smoking status, substance use during pregnancy, marital status, maternal education, and child birth order though some associations were seen for moderately low birth weight [10]. Associations observed between birth weight and community SES may have resulted from residual confounding from individual-level characteristics.
No previous study allowed or accounted for multiple children per family. Statistical models assumed either nested data [3,[8][9][10] or independent observations after determination of negligible correlation at the community-level [4]. Data in this study included multiple children per family and changes of residential address between births. When we restricted this sample to a single child per family we observed increased effect sizes for family-level SES variables. While inferences from Models III, IV, and V were otherwise very similar, it is noteworthy that conclusions based on family-level SES would have differed had we restricted our sample. The cross-classified model allowed us to include all observations, an advantage that may affect inferences, particularly for highly mobile populations. It may also improve representativeness of the sample when compared to the entire population. As was appropriate for a non-nested data structure, we adjusted for family-and community-level SES and accounted for multiple sources of correlation using a cross-classified statistical model using standard software.
When drawing samples from populations with a primary interest in community-level variables, subjects often fall into subgroups within communities. In the present study, subgroups were families. We found that sibling birth weights were highly correlated and betweenfamily variation accounted for much of the overall birth weight variation. Incorrectly assuming that subjects are independent may lead to incorrect variability estimates and test statistics.
Analysts often assume random components are needed without considering alternate models. In the present study we found that adjustments for within-family correlations were important; however, there was little variability between enumeration districts and random components for enumeration district would be excluded in the most parsimonious model. Similar results indicating small community-level correlations have been found in other studies [4,17] emphasizing the importance of testing the necessity of random components to identify the best model for the data. As hypothesis tests and goodness of fit criteria such as the AIC statistic are straightforward to apply, there is little reason to disregard this advice when using multilevel models to account for multiple sources of correlation.

Conclusion
Multilevel models are a common tool for examining associations between outcomes and exposures at individualand group-levels. While hierarchical models are often assumed, many populations do not have a nested data structure. Analysts can employ cross-classified models to allow for non-hierarchical, multilevel data. Researchers can evaluate whether random components are necessary and compare goodness of fit statistics to select a parsimonious model.
Using cross-classified multilevel models we analyzed the association of birth weight with family and community SES while adjusting for numerous potential confounders. The flexibility of cross-classified models allowed us to sample more than one child per family and to include families who moved between births. Results from a restricted to individual-level variables generally found on birth records, showed an association of birth weight with community SES. When we adjusted for a more complete set of individual-and family-level covariates, community SES was not as important. Residual confounding may explain some results previously reported between community SES and birth weight.