Spatial distribution and cluster analysis of sexual risk behaviors reported by young men in Kisumu, Kenya

Background The well-established connection between HIV risk behavior and place of residence points to the importance of geographic clustering in the potential transmission of HIV and other sexually transmitted infections (STI). Methods To investigate the geospatial distribution of prevalent sexually transmitted infections and sexual behaviors in a sample of 18-24 year-old sexually active men in urban and rural areas of Kisumu, Kenya, we mapped the residences of 649 men and conducted spatial cluster analysis. Spatial distribution of the study participants was assessed in terms of the demographic, behavioral, and sexual dysfunction variables, as well as laboratory diagnosed STIs. To test for the presence and location of clusters we used Kulldorff's spatial scan statistic as implemented in the Satscan program. Results The results of this study suggest that sexual risk behaviors and STIs are evenly distributed in our sample throughout the Kisumu district. No behavioral or STI clusters were detected, except for condom use. Neither urban nor rural residence significantly impacted risk behavior or STI prevalence. Conclusion We found no association between place of residence and sexual risk behaviors in our sample. While our results can not be generalized to other populations, the study shows that geospatial analysis can be an important tool for investigating study sample characteristics; for evaluating HIV/STI risk factors; and for development and implementation of targeted HIV and STI control programs in specifically defined populations and in areas where the underlying population dynamic is poorly understood.


Introduction
The well-established connection between HIV risk behavior and place of residence points to the importance of geographic clustering in the potential transmission of HIV and other sexually transmitted infections (STIs) [1][2][3]. Rothenberg et al. hypothesized that one of the elements that plays an important role in maintaining endemicity of HIV in high prevalence settings is the geographic range of persons in a social network [1,2]. The correlation between the geographical proximity of persons connected within a network, but not necessarily known to each other [2], was found to be an important factor in maintenance of HIV endemicity due to a high probability of partner selection from within the network. Latkin et al. found an association between the characteristics of neighborhood and the prevalence of sexually transmitted infections and sexual risk behaviors [3]. However, there have been few recent studies evaluating the spatial distribution of STIs, HIV, and sexual behavior. Most studies collect and analyze spatial variables on a relatively crude scale, such as neighborhood, census track, zip code, or by classifying the participants' residences as urban or rural.
The current study represents a continuation of earlier work focusing on the identification of "high transmission areas". Such areas are locations where social mixing is combined with commercial activity, such as trading centers, truck stops, and places with high concentrations of migrants [4,5]. These high transmission locations are believed to be concentrated in urban areas where HIV seroprevalence is higher than in neighboring rural areas [6][7][8][9][10]. However, HIV prevalence in rural areas is not homogenous, with large differences in prevalence between different settlements [11][12][13][14]. Previous research on rural-urban differences in HIV prevalence and sexual behavior led to the following opposing predictions. Some researchers suggest that established epidemics may involve urban and rural communities interacting in a complex manner with in-and out-migration patterns leading to eventual equaling of HIV prevalence in urban and rural locations [7]. An opposing opinion, that rural HIV prevalence will always be lower than comparable urban areas, maintains that sexual behaviors in urban settings provide increased opportunity for commercial and casual sex and degrade protective cultural traditions regulating sexual relationships [15,16]. Most HIV research comparing risky sexual behaviors in rural and urban settings [7,[17][18][19] have observed behavioral differences between these populations, with higher HIV risk behaviors more frequently found in urban areas. However, in some places in rural India, it has been shown that sexual networks are such as to maintain HIV transmission independently of urban locations, with HIV prevalence in some rural areas higher than in comparable urban locations [11,12]. A study in Kenya also found that HIV risk behavior was more prevalent among women in rural compared to urban areas of Nyanza Province [20].
The objective of our study was to investigate a geospatial dimension of the distribution of STI and sexual behaviors in a sample of 18-24 year-old sexually active men in Kisumu, Kenya. Mapping participants' residences allowed for cluster analysis based on home location. Additionally, by incorporating spatial measurements with demographic, behavioral, and STI data it was possible to assess associations between residence location, sexual behavior, and prevalence of STIs.

Study location
The study took place in Nyanza Province, Kisumu district (presently Kisumu East and Kisumu West districts), located in western Kenya. The study area primarily includes the Kisumu East district and covers approximately 240 km 2 . Kisumu East district consists almost entirely of the Municipality of Kisumu, the third largest city in Kenya, with a population of approximately 500,000 residents [21]. The district is located on Lake Victoria, 10 km south of the equator, and approximately 1,100 meters above sea level.

Population and study design
In order to investigate the safety and effectiveness of male circumcision as a potential HIV prevention strategy, a randomized controlled trial (RCT) of male circumcision to reduce HIV incidence in young men was conducted in Kisumu, Kenya [22]. To explore the determinants of the socio-economic status, access to healthcare, and spatial characteristics of the trial participants, including the geographic distribution of sexually transmitted infections (STI) and risk behavior, we developed a nested sub-study by recruiting participants active in the trial between June 30, 2005 andMarch 13, 2006. In this paper, we present a spatial analysis of the baseline characteristics of these trial participants.
Respondents were trial participants who at their baseline RCT visits were uncircumcised, HIV-negative, aged 18-24, sexually active within the last 12 months and residing in Kisumu district (presently Kisumu East and Kisumu West districts). Consenting of participants and interviews were conducted by a trained, experienced male interviewer in the participants' language of choice (English, DhoLuo, or Kiswahili). This study received ethical approval from the institutional review boards at the University of Nairobi, the University of Illinois at Chicago, the University of Manitoba, and the Research Triangle Institute.

Variables
All consenting participants were required to complete a single study visit during which they were asked to provide information on their household utilities and possessions, the healthcare facility they visited last, and geographic coordinates of their residences. To obtain geographic coordinates, participants located the residence where they lived at the time of their enrollment in the trial on a georeferenced satellite image, with assistance from a research assistant with extensive knowledge of Kisumu geography and trained in geographic information systems (GIS). Geographic coordinates were verified for 25% of participants by traveling to identified locations and mapping them with a Global Positioning System (GPS) device. All participants provided permission to link these data with their demographic, behavioral, sexual health and STI data.
Spatial distribution of the study participants was assessed in terms of the demographic, behavioral, sexual dysfunction and laboratory diagnosed STI variables recorded at the baseline RCT visit. Demographic variables included age, education, income, importance of religion, and marital status. Behavioral variables included sex after or while drinking any time in the past or at last sex, age at first sex, number of partners in the last 12 months, exchanging money or gifts for sex in past 6 months, sex on the same day as meeting a partner, condom use in the past 6 months and at the last sex, insertive anal sex with a woman, and preference for dry or wet sex. Education, preference for dry or wet sex, and number of partners in the last 12 months were analyzed separately as nominal or ordinal variables as appropriate. All other variables were considered dichotomous. Sexual dysfunction lasting for two or more weeks in the last six months served as an indirect indicator of potential recent history of STIs and was assessed through six questions: lack of interest in sex, inability to climax, coming to a climax too quickly, pain during intercourse, lack of pleasure in sex, and trouble achieving or maintaining erection. Laboratory diagnosis of the following STIs was obtained from the trial baseline visit data: Herpes simplex type 2 (HSV2), syphilis, gonorrhea, chlamydial infection and trichomoniasis. The detailed explanation of the STI testing procedures is described elsewhere [23,24].
Geo-referencing was carried out using a GPS device of 3 meters accuracy (Magellan eXplorist 100 receiver), satellite imagery and ArcGIS 9.0 software (Environmental Systems Research Institute, Inc. Redlands, California, USA 2004).

Statistical analysis
To test for the presence and location of clusters we used Kulldorff's spatial scan statistic as implemented in the Satscan program [25][26][27][28][29]. Kulldorff's Satscan program has been widely applied in public health research [30][31][32][33][34], partially due to the advantage of using a simple statistic for identifying spatial clusters based on geographic coordinates. The method uses a circular window of varying radius centered at each location that moves across the map so that at any given position the window includes different sets of neighboring residences. At each position, the radius of the circular window varies repeatedly from zero up to a set maximum radius, so that the maximum size of the window does not exceed 50 percent of the total study population. This method allows the circular window to continuously vary in both location and size, thereby creating a large number of distinct potential clusters.
The detection of clusters was performed by comparing the number of cases within the window with the number expected if cases were randomly distributed in space, using a Bernoulli model. The Bernoulli model provided the advantage of conducting a cluster analysis in the absence of data on the underlying general population density by considering the study population as a combination of cases and controls represented as a 0/1 variable investigated for clustering [26,35]. The unit of space was defined by the coordinates of the households. For ordinal and nominal variables, ordinal [36] and multinomial models [37] were used. We scanned for high rate clusters (i.e., observed number of cases exceeds the expected number of cases) and low rate clusters (i.e., expected number of cases exceeds the observed number of cases) for all dichotomous, ordinal, and nominal variables.
The likelihood was maximized over all windows, identifying the most likely or primary cluster. In addition to primary clusters, the software identified multiple secondary clusters ordering them according to the likelihood ratio. In this paper, we report primary clusters for all analyzed variables, as well as secondary clusters that do not overlap with primary clusters and have p value of less than or equal to 0.1. High and low rate clusters are reported in separate tables.
The significance of the identified clusters was tested with a likelihood ratio test with p-value based on Monte Carlo simulations. For Monte Carlo inference, 9,999 replications were performed for dichotomous variables and 999 replications for ordinal or nominal variables. The null hypothesis of no clusters was rejected when the p-value was less than or equal to 0.05. The rate ratio (RR) was defined as the ratio of observed to expected cases.
All GIS manipulations and cartographic displays were performed in ArcGIS 9.0 software.

Study population
Approximately 1,800 trial participants were eligible for this sub-study. Men were passively recruited through information available at the clinic reception, posted fliers, and word of mouth. Between June 30, 2005 and March 13, 2006, we were able to recruit 1,040 men to the substudy. This analysis is restricted to 649 of the 1040 men for whom residence coordinates were collected. For these 649 men, 185 (29%) were recruited at the baseline RCT visit, 218 (34%) at the 6-month, 140 (22%) at the 18month visit, and 104 (14%) at the 24-month RCT visit. This sub-sample represents 23% of the RCT total sample (2,784 men). Participants were asked to locate the houses where they lived at the time of their enrollment into the trial. At the 6-month, 12-month, and 24-month RCT visit 80%, 73%, and 57% of the sub-study participants, respectively, indicated that they had not moved since enrolled into the trial.
Baseline characteristics of the 649 men are shown in Table 1. The majority of these men were single (95%) and unemployed (59%). Equal numbers of men had been randomized into circumcision and control groups. Engaging in insertive anal intercourse, paying or exchanging gifts for sex, and drinking alcohol with sex were reported infrequently (4%, 7%, 8%, and 11%, respectively). Condom use was measured by three different variables: in the past 6 months 66% reported ever using condoms; 33% used condoms more than half the time; and 53% used condoms the last time they had sex. Twenty-five percent reported having sexual dysfunction for a period of 2 weeks or longer in the past 6 months and 29% were diagnosed with one or more STIs at the baseline RCT visit. Men who  joined this sub-study were less likely to be married (5% vs. 7% p = 0.01), more likely to have completed secondary school (61% vs. 55%, p = 0.001), more likely to report using condoms the last time they had sex (53% vs. 47%, p = 0.01), more likely to have no preference between wet and dry sex (×2 = 10.92, p = 0.03), more likely to initiate sex before the age of 15 (34% vs. 30%, p = 0.03), and less likely to report any of the sexual dysfunction items (p < 0.001 for all items) than those who did not enroll. There were no significant differences between the age of the participants (p = 0.49), randomization group (p = 0.88), importance of religion (p = 0.81), having sex the same day met a partner (p = 0.32), paying (p = .07) or giving gifts (p = 0.07) for sex, using condoms, whether always (p = 0.08) or half the time (p = 0.20), drinking alcohol last time had sex (p = 0.74), having anal sex with a woman (p = 0.67), and number of sex partners in the last 12 months (p = 0.25). There were no differences between the prevalent sexually transmitted infections at the baseline between those who enrolled in this sub-study and those who did not.

Distribution of high rate spatial clusters
Formal cluster analysis identified two statistically significant high rate clusters for education and employment variables ( Table 2, Figures 1 and 2). The employment cluster was a small one (39 men, 31 of them employed and only 15.78 were expected to be employed, based on random distribution of cases) with a high cluster rate ratio of 1.96. This cluster was located in a rural area. The education cluster was larger, with a population of 95 participants, 88 of whom had secondary or higher level of education, compared to the expected number of 68.42. Most participants in the education cluster came from a middle-class neighborhood. The only statistically significant cluster identified in our sexual behavior variables was a cluster of men who used condoms less than half the time in the past 6 months ( Table 2 and Figure 3). This  cluster included 134 men, of which 73 reported using condoms less than half the time with 49.55 expected, resulting in a cluster rate ratio of 1.47. This geographically large cluster with a radius of 2 km covered a great part of central Kisumu, with a high concentration of participants residing in a sprawling low-income neighborhood.

Lack of interest in sex
No statistically significant high rate clusters were identified for other sexual behaviors, nor for sexual dysfunction, or STI diagnosis variables.

Distribution of low rate spatial clusters
Cluster analysis identified two statistically significant low rate clusters for education and employment variables ( Table 3, Figures 1 and 2). The cluster for employment included 171 men, 39 employed and 69.20 expected to be employed based on simulations. This cluster was located in the northeast with a concentration of participants in a lower-income neighborhood of Kisumu town. The education cluster included 17 men, with only two of them reporting a higher level of education, compared to 12.24 expected, and was again located in a lower-income neighborhood.
No statistically significant low rate clusters were identified for any of the sexual behaviors, sexual dysfunction, or STI diagnosis variables.

Results of ordinal and multinomial models
The results of ordinal and multinomial cluster analysis for education, preference for dry or wet sex, and number of partners in the last 12 months are presented in Table 4. When treated as a categorical variable, the cluster analyses for education resulted in the identification of three clusters. Cluster #1 had high rates of post-secondary education (RR = 2.07), and low rates of primary (RR = 0.62) and secondary education (RR = 0.98). Cluster #2 automatically collapsed the two higher education levels, resulting in high rates of secondary and above education (RR = 1.29) and low rates of primary education (RR = 0.26). Cluster #3 identified high rates of primary education (RR = 3.30) and low rates of secondary and post-secondary education (RR = 0.13 and RR = 0.00, respectively). No statistically significant clustering was detected for preference of dry versus wet sex and number of partners in the last 12 months.

Discussion
Several studies have assessed differences in sexual behaviors and HIV prevalence in rural and urban settings in Kenya. A national level study based on the Kenya Demographic and Health Survey (KDHS) in 2003 found rural residence a protective factor for HIV infection in men, but not for women [38]. These national results may not reflect the situation in Nyanza province, however, where the HIV prevalence is the highest in the country and the epidemiology of HIV is likely unique. Voeten et al. [20] conducted a study examining urban and rural distinction in Nyanza province, finding that rural female residents reported higher levels of risky behavior than female respondents residing in urban settings. No differences were observed in males. The relative lack of spatial clustering noted in our sample of young men support this finding.
However, while Voeten and colleagues carefully constructed their sample to include 15-29 year old men and women representative of the corresponding population of one urban (Kisumu township) and two rural (Bondo and Siaya districts) areas, the sexually active young men constituting our sample can not be considered representative of a specific population. It is difficult, with the limitations of our study sample, to draw any conclusions based on our findings or directly compare the results of the two studies.
Analytically our method of cluster detection differed from both studies noted above allowing for the detection and definition of clusters based on individual-level spatial characteristics rather than a static designation of "urban" and "rural" areas. This methodology establishes the extent of heterogeneity across an entire study area, including intra-urban and intra-rural spatial clustering, and defines geographically localized groups of participants by measured characteristics. Consequently, areas of higher concentration of risky behaviors or HIV/STI cases are identified independent of administrative boundaries or classifications, such as urban or rural designation, that are designed to serve as "proxy" measures for underlying geospatial characteristics. Individual-level spatial measurements enrich the spatial characterization of a sample and can provide valuable information for programmatic implementations.
For example, the education clustering found in our study provides understanding of the study sample's educational and spatial heterogeneity beyond answering the question of whether rural-residing and urban-residing respondents differed in their educational attainment. One can see that education is not uniformly distributed by residence, and further that areas of educational clustering, both with more and less education than expected, extend across previously described urban and rural classification. Considering employment along with education, one can visualize that while there may be some spatial linkage, it is weaker than one might expect, with employment falling more clearly within the known urban and rural divide. With the limitations in our study sample any conclusions or interventions based on these group- ings are not warranted, however our results do highlight that geospatial analytic tools can be of value. This usefulness increases in settings where administrative structures are not clearly defined, long periods have passed since populations were grouped and defined, or where population data are simply not available.
Considering our findings for this non-representative at risk group, the general lack of clustering, other than condom use, over the geographical area of the study indicates that there is little to no differentiation in sexual risk behavior based on home location in our sample. The lack of clustering of prevalent STIs is likely an extension of this relative equality in high-risk behavior in this sample. The noted localized clustering by education, but again not by sexual behaviors or STIs, suggests that education may not significantly contribute to variations in risk behaviors and STIs in this geographic context. Conducting additional analysis (results not shown), we observed that while education was associated with a number of sexual behaviors among the 649 men in our sub-study, we found no association between educational attainment and sexual behaviors among the men living within our two education clusters. This may be a sign that spatial distribution significantly impacts the relationship between education, sexual behaviors and STI prevalence. Extending this, we can describe a group more likely to choose a home location based on factors other than sexual risk (i.e. socio-economic factors) and, once chosen, a home location that has little impact on individual risk behavior.
Appreciating the uniqueness of our sample, we do note a single, geographically large, area where men reported condom use distinctive from men outside the cluster. Condom use was measured by three different variables: 1) using condoms at last sexual encounter, 2) ever using condoms in the past 6 months, and 3) using condoms more or less than half the time in the past 6 months. Condom use was localized only for 3) using condoms more or less than half the time, a question designed to gauge general propensity toward condom use with less polarity than the other two measures. If our sample was more representative of a defined population of interest, this finding of low condom use might be of interest as the area identified is west and north of independent population centers with well established HIV/STI programs. A possible gap in services could be considered.
As discussed, there were several limitations to this study. Because the study was restricted to a convenience sample of healthy, HIV-negative, sexually active 18-24 years-old men participating in the male circumcision trial, the sample cannot be considered representative of the general male population of Kisumu district. In addi-    tion, because the trial targeted high-risk young men, we were unable to detect behavioral and STI clustering across or within any other group (e.g. women or those at lower risk). Small numbers of cases for each STI may be responsible for our inability to detect clustering based on prevalence. We were unable to perform a meaningful analysis of the homosexual behaviors due to the extremely small number of study participants reporting a history of having sexual relations with men. Spatial analysis was limited to participants' residences and did not include geographic sites of socialization. As with any study of sexual behaviors, the validity of the self-reported responses may be affected by social desirability and reporting biases. RCT interviewers were highly skilled and extensively trained on techniques that may help in limiting such information biases, such as establishing rapport with clients prior to conducting the interview, and ensuring privacy and confidentiality. Asking men who enrolled into this sub-study during one of the later follow up (i.e., 6, 12, 18 or 24 months) RCT visits about their residence at the time of their enrollment in the trial may result in recall bias. However, the majority of participants reported maintaining the same residence since trial enrollment. Lastly, Kisumu East district has a mainly urban population and a rural environment that is relatively densely populated and closely tied to the urban center. This was, naturally, reflected in our study sample and may have affected our ability to detect and define "rural" clusters. Nonetheless, this remains one of the first studies to evaluate the spatial distribution of STIs and behaviors on an individual rather than aggregate level.
In conclusion, collecting data on the geographic location of study participants' residences, or other important locations, may be useful in the careful characterization of a study sample. Geospatial analysis can be an important tool in the investigation of HIV/STI risk factors in specif-