We found that a multi-stage geocoding method implemented by the WA DOH achieved a match rate 4% higher than that achieved by a single-stage method. Most addresses were matched by both methods, but they were not geocoded to exactly the same coordinates by each method: 10% of addresses were assigned locations at least 180 meters apart by the multi-stage and single-stage methods, and 2% of addresses were assigned locations at least one kilometer apart. Locations assigned by the two methods were closer together in high density and high poverty areas, and in areas where reference data sources were most similar for the two methods. The results for area-level poverty, which were contrary to our hypothesis, were not explained by density or availability of local reference files. The associations of area-level poverty and density with discrepancy-distance were strongest where the two methods used similar reference files.
Previous studies have evaluated different single-stage geocoding methods  or geocoding vendors [2–4], automated versus interactive geocoding methods , or compared a single-stage geocoding method with a gold standard [7, 8]. Also, McElroy described and recommended the use of a multi-stage geocoding process, despite added costs . Our study contributes to this literature by (1) providing further information on geocoding results of a multi-stage process as compared to a single-stage process, (2) confirming previous findings that geocoding methods may have better agreement in densely populated areas [4, 7–9], and (3) suggesting that geocoding methods may also have better agreement in high poverty areas, after controlling for population density. Geocoding discrepancies in low poverty areas could be due to differences in address quality, reference file quality, or other determinants of geocoding error (such as recent redevelopment, street length, or lot size). If this association is confirmed, further research will be needed to distinguish among these possibilities.
Our study investigated whether single-stage geocoded address coordinates were systematically shifted relative to the multi-stage address coordinates. We found that the single-stage coordinates were shifted north-south and east-west relative to the multi-stage coordinates more often than would be expected by chance alone. This may have been due to different assumptions about how addresses are spaced along a street , since WA streets are more likely to be oriented in the cardinal directions than would be expected by chance. In Washington State, we estimated that 42 percent of street segments are within five degrees of being oriented directly north-south or east-west (only 11 percent expected by chance). This directional shift finding may be most relevant to areas where urban planners played an active role in establishing N-S and E-W roadway grids.
Since the accuracy of address geocoding depends on address quality, preprocessing, program settings and reference maps , further research is needed to understand the effects of each component. While our study and others have controlled for address quality by using the same addresses for both methods, we simultaneously examined differences in preprocessing, geocoding software, and reference maps. A limitation of our study is that we did not discern which elements contributed most to the difference between the two geocoding methods under investigation, and cannot use these data to project differences between other approaches. We considered only one of many possible contrasts between geocoding methods, by comparing one multi-stage process to one single-stage process. Given that these two methods were implemented independently, using different software packages and reference files, this contrast may provide an upper bound for how much geocoding methods in large research studies would be expected to differ for a state-wide administrative data set. Also, we had no gold standard with which to evaluate the relative accuracy of the two geocoding methods; however, a third geocoding method using satellite images (implemented using Google Earth Pro, as described below in the Methods section on Supplemental geocoding) agreed more closely with the multi-stage geocoding method. Rather than focusing on comparisons with a gold standard, we evaluated which area characteristics predicted larger discrepancies between two geocoding methods.
Another limitation was that there may have been unmeasured or residual confounding by address or area characteristics in this study, interfering with our ability to assess which characteristics predicted geocoding discrepancies. Finally, the geographic scope and distribution of our study addresses limits the generalizability of this Washington-based study . These data were statewide and may be similar to other health department address data; however, the geocoded addresses were all numbered street addresses with ZIP codes and did not include Post Office boxes.
The importance of the differences between any two methods depends on the context and purpose of geocoding. Both the level of analysis and hypothesized exposure effects will influence the cost of geocoding errors. The available data or confidentiality protections may constrain some researchers to work with data at the zip code or census tract level, or even "jittered" address locations with deliberately introduced error. Some researchers might find our 96% concordance at the census tract level encouraging. Single-stage geocoding using street addresses may be adequate for some research purposes. However, for a study of small-scale environmental exposures, such as radiation, the ability to detect or replicate an association may depend on the geocoding method selected, and even multi-stage geocoding may place addresses far from their actual locations. The importance of relative geocoding precision may also vary across areas. For example, the commonly observed pattern of decreased geocoding accuracy in sparsely populated areas may be of little concern if an exposure, such as air pollution, is less variable across small distances in a rural context. Geocoding match rates which vary among geocoding methods can also affect the power and external validity  of spatial epidemiology studies; subjects with unmatched addresses may not be representative and are generally excluded from further analyses.
We refer those choosing a geocoding method for a particular study, research group or health department to previously published reviews and recommendations [1, 3, 6, 14]. Based on our experience, even a group with limited resources and time can incorporate geocoding through a low-cost, single-stage method, like the one described here. This is likely to be adequate when (1) the addresses are relatively free of spelling and formatting errors, as may be the case with billing addresses; (2) the addresses of interest are mainly in high density or high poverty areas; and (3) the exposure of interest varies only gradually with distance. For organizations like the Washington State DOH, initial costs for setting and validating a multi-stage system may facilitate a variety of projects by improving match rates and utilizing local geographic files when available. Another option, not evaluated here, would be using a commercial geocoding vendor [2, 4, 7].