Patient addresses in EMRs can be linked to institutional address data in standard database programs using SQL queries (e.g., ABC = ABC), but there are at least two advantages to using a geocoding approach and GIS software. First, geographic coordinates are returned for addresses that are successfully geocoded. This information can be useful for visual or analytic interpretation of spatiotemporal patterns of infection. Previous studies have illustrated the use of geocoded data to identify clusters of CA-MRSA in Chicago, IL,  Springfield, MA,  and West Midlands, UK . Second, geocoding incorporates a standardization process that partitions address elements into different components (street name and number, street prefix and suffix, city, state, and ZIP code) and standardizes spelling and abbreviations. Even if the address does not geocode to a geographic location, the standardization process can facilitate improved record-to-record matching between EMRs and institutional address data. In our sample, address standardization more than doubled the number of patient addresses that we were able to associate with an institutional facility.
We used institutional name and address data that can be openly accessed via the Internet from federal, state, and local agencies. Therefore, the process we used in this study should be extensible with little or no added data costs. We found that the HHS Nursing Home Compare database accounted for the majority of patients associated with a nursing home facility, but a small percentage of nursing home matches were facilities unique to state health department databases. Conversely, while they were less frequent in number, about half of the patient records matching to a hospital were associated with facilities unique to state health department databases. This is understandable, given that the HHS Hospital Compare database focuses on acute care or critical access facilities, while state databases include long-term care hospitals.
Knowing that a patient resides, or has recently resided, in a healthcare-associated facility can affect healthcare decisions involving antibiotic therapy and infection control precautions. Given the increased incidence of multi-drug resistant organisms in such facilities, it would be advantageous to know a patient's residential location when initiating antibiotic therapy for a presumed infection. For instance, the rate of MRSA pneumonia is known to be higher in patients residing in long-term care facilities  as is the rate of C. difficile diarrhea . Both pneumonia and diarrhea would be treated more aggressively knowing that a patient resides or has recently resided in a healthcare-associated facility. Likewise, until an infectious organism is identified, a physician may conservatively decide to implement infection control precautions.
Patient residence information may also be useful in biosurveillance to recognize outbreaks of infectious organisms in communities. Early detection of an evolving epidemic could provide an opportunity for intervention, mitigating the spread of infection. Further, such information could be used to more efficiently allocate educational resources to facilities and neighborhoods identified as "hot spots" for particular multi-drug resistant organisms.
The methods presented have several limitations. Some of the matches between patient records and healthcare facilities required manual editing due to variations in the way facility names were entered in the EMR. For example, companies may operate multiple nursing home facilities with similar names (e.g., "ABC Nursing Home South", "ABC Nursing Home North", etc.). If the EMR address contained only "ABC Nursing Home", matching to a specific facility required additional information, such as ZIP code. Misspellings of facility names presented similar problems that required manual editing. All of the automated matches we achieved were based on exact agreement between institutional and EMR patient databases. Future implementations of this method could investigate algorithms that permit some flexibility in matching requirements (e.g., fuzzy matching) but results should be reviewed to ensure that less than perfect matches do not produce false positives that can result from similarities in facility names. In addition, previous studies have documented geocoding biases that result in lower match rates in rural areas  and among disadvantaged and minority populations . Prior to using geocoded data for analytical purposes, such as cluster analysis, these potential biases should be investigated. Despite these limitations, our process identified patient risk factors that were not readily apparent in existing EMR data and provided results that enabled us to more accurately identify a cohort of CA-MRSA patients.
Among the lessons learned from implementing this study is that there can be a substantial errors in EMR patient addresses that results at the point of collection (e.g., when a patient provides their home address at a physician's office or when the address is entered into the EMR). As EMRs are increasingly integrated with GIS, there is potential to improve the collection of address data at the point of capture. For example, just as a spell checking feature can highlight potential spelling errors in real time, technologies that integrate geographic information with patient data entry programs could highlight potential address errors while the data are being entered. Similarly, by cross checking patient address with geographic databases, potential errors can be highlighted after data entry that prompt requisitions for updated address data at follow-up contacts with the health care system.