At the 2009 URISA GIS in Public Health Conference, a workshop organised by Ellen Cromley and Andrew Curtis focused on the issue of location privacy in health research. Among the topics covered by panellists and attendees were methods of spatial data protection, the need to "educate" IRBs, challenges facing data owners and custodians wishing to visualise and disseminate data, how published maps continue to violate confidentiality, some general cartographic guidelines and "fixes", and new methods of spatial data masking. In addition, the participants spent considerable time discussing the ethical and legal challenges researchers now face as HIPAA (US Health Insurance Portability and Accountability Act) regulations change, placing more responsibility on the data user (researcher). Although the majority of attendees to the meeting were data owners or custodians, this article is written mainly from the perspective of the data user, especially a social science/geographic information science researcher. As researchers, our usual role is to spatially analyse data, collect new spatial data with health implications, and visualise results in multiple forums, especially academic journals.
In 2006 Curtis et al. published a paper in International Journal of Health Geographics highlighting the potential for point level data to be reengineered from published maps through a process of digital scanning and georeferencing, even with only limited geographical features [11]. By heads-up digitising these points, coordinates could be used to direct field teams to actual homes. This conceptual approach had previously been impossible to replicate with real data, but by using this case from Hurricane Katrina, the map of mortality locations, and search and rescue markings that actually identified where bodies were found, validation was possible. Concurrent to this article, other reengineering approaches appeared using simulated data and a more systematic approach to identifying homes from a low resolution map [12]. Both papers revealed that published maps, even of low resolution and with limited geographical information, could still be reengineered back to an exact address, or so close to the 'real world' location that even without resorting to use other quasi identifiers, the spatial confidentiality of those being mapped was violated.
As researchers specialising in geographic information, we need to be proactive in setting guidelines for the display of confidential data, in policing our own actions, and in educating those sitting in positions of data power, especially our IRBs. Critics of the presentation usually focus on the data source–"this is a newspaper map so there is no confidentiality violation". However, there have been at least two maps appearing in journals that have also published the same Katrina mortality point locations. But irrespective of this, the real message is, any point level map can be reengineered back in the same way. As academics where does our ethical path lie with these secondary sources obtained from the media? We may not legally be violating confidentiality, but does that give us the right to use non-official sources, apply our geospatial skills and create sensitive layers in other outlets?
Now in mid-2009, what has changed? Are maps still being published in academic journals that violate spatial confidentiality? And where are we on the issue of cartographic guidelines? Unfortunately it is still too easy to find similar map violations appearing after 2006. One can find examples of maps with point level mortality locations, pregnancies, at-risk pregnancy programme participants, and people suffering from different respiratory ailments – indeed we challenge the reader to see how many point level health maps they can find. Of particular concern are those sub-disciplines which have just discovered the value of GIS–we cannot expect that confidentiality violation through cartographic design is uppermost in the minds of those effusing over the wonders of buffering.
What else has changed? Paradoxically, the attention currently being paid to geocoding accuracy – which is important from a health research perspective, and which has received considerable attention in International Journal of Health Geographics – also has a detrimental side in terms of making published source maps both more accurate and precise. This means the chance of successful reengineering in terms of being closer to the actual address has increased. In effect, this previously unintentional form of masking has been reduced. Secondly, smoothing approaches, such as density surfaces, are being used to preserve confidentiality in maps (and stated as such by the authors). On one hand this is good news in terms of researchers' understanding that there is a confidentiality issue, but on the other hand this quick-fix is problematic due to a reliance on techniques that do not achieve this goal. The combination of window/kernel/filter size, the underlying grid cell resolution, and especially if there is no option for a minimum denominator, may result in "bulls-eyes" for areas of the map with relatively few residential alternatives, otherwise known as the 'geographic area population size' [47]. It should also be remembered that less dense geographies are not necessarily rural; many urban areas also contain physical features (inlets, lakes, even hills) can remove alternative possible locations. By referring to high resolution aerial photography (now found easily in applications such as Google Earth [35]), it is relatively easy to identify the cause of the intensity. On this subject, geospatial Internet applications in general have made the reengineering process even easier for those with and without a working knowledge of GIS.
From a data users' perspective, we are still limited by data being released at an aggregation that is limited for research, the standard for HIPPA being a zip code with 20,000 individuals. A group of Canadian researchers showed that this is an archaic approach and that minimum denominators should vary when taking into account the underlying geography and the number of quasi identifiers [47]. Similar papers written for researchers in other countries, possibly even providing a series of size guidelines for different urban areas, would be invaluable. It would also help the job of IRBs.
And on the subject of IRBs, from our experience there is still a disconnection in terms of understanding exactly where the risks lie in geospatial output and confidentiality. This is understandable given the confusion even amongst geospatial researchers. What would benefit all concerned would be a well-respected body in the field of public health to commission a "guidelines" paper. This could become the reference in terms that researchers, IRBs, and even research subjects could understand and cite, along with other existing key texts, such as [48]. These reference guidelines should include clear visual examples of what is not acceptable, including the pitfalls of common "fixes" such as smoothing. They could also provide guidance for appropriate aggregation denominator sizes. This is important as researchers seek IRB approval in the use of mobile geospatial devices for collecting health and built environment data. We cannot expect IRBs to understand where such cartographic risks lie. Finally, language should be included that would help IRBs and be required in any letter gaining subject permission. In other words, "if we ask for an address (or a street intersection... or a zip code...) this is the only way we will display it on a map". This simple approach would mean that IRB, researcher and subject would all have the same understanding of what will happen with these data. (Ellen Cromley has vested considerable time on spatially appropriate language for informed consent as a guideline for IRBs. She disseminated examples of this language at the URISA workshop.)
This 'best practice guide' should be circulated to all journals who publish maps, clearly stating the risks involved in accepting point level maps. At least this would enlighten editors [49] and hopefully force them to ask submitting authors about 'what steps have been taken to preserve confidentiality?'
Until we have such a universally accepted document, self-policing is the main option, and with this in mind, we have a few issues a researcher should ponder before publishing any map. Most importantly, is a point-level (or smoothed, or small aggregation) map necessary? As a geographer this last statement certainly hurts, but unless a map is really needed to help frame a paper's content, or improve the understanding of the reader regarding a spatial process, and especially if it is not even specifically referred to in the text, then it is better to err on the side of caution.
We fully realise that some point-level maps will still need to be published; it is often easier to explain a spatial process through a graphic, but if this is the case then is the underlying geography needed? If we are overlaying points against output from a spatial analysis, do we need political boundaries or street networks? If geographical references are necessary in the map, then data masking is essential.
There is some good news though, as we have noticed more researchers referencing steps taken to preserve confidentiality during recent presentations.
Emerging issues
There are three emerging spatial confidentiality topics of concern. The first involves Google Street View [50], an excellent research tool that allows us to "see" areas that are described or mapped in publications. The implication this has for reengineering is the ability to see potential candidates within an area. If we again think of the "bulls eye" effect within a smoothed surface, if this area has been driven by the Google Street View team (and thankfully at this point areas of sparse geography also tend to be the least covered), we could literally view each option within the central pixel until a house match is found. Even with multiple alternatives, it might be possible to spatially prioritise the potential buildings based on characteristics of the health conditions, or other information gleaned from the paper. For example, is the disease more typically associated with a multi-family unit than a single residence?
The second area of concern involves the use of biometric sensors synched to a GPS (Global Positioning System) unit. This field of research offers great potential in terms of linking health outcomes to the fine-scale built environment. However, a fear expressed at the URISA workshop was that output from these devices, usually shown as a series of dots on an aerial photograph, will begin to accompany research papers. Sure enough, within one day of the workshop a new issue of a GIS journal published this exact output. The underlying aerial photograph makes reengineering from the image extremely easy, and the point concentrations from the GPS unit correspond to areas of highest activity, including the home. This is not a good situation, especially when the participants are part of a vulnerable population, such as children.
Finally, we are worried about the current trend by social scientists of including spatial data in their research, especially those who use mixed methods. A mixed method approach combines both qualitative and quantitative data. For example, spatial video data of the recovering neighbourhoods of New Orleans, LA, USA, are currently being collected. These data are extracted from the video as three-dimensional surfaces that can be mapped or analysed for recovery or abandonment. At the same time, videos of the narrative of the neighbourhood participants add further commentary to the surfaces, such as why a building has not been returned to. Many of these comments contain sensitive information such as the health of an owner. If we map this information, others could easily disseminate it through online consumer geoinformatics services like Google Earth and Google Map, and even link it using suitable geo-mashups [51] to other readily available online information about the individuals concerned (e.g., on social networking sites), thus revealing a more detailed picture about them. Do our subjects really know all what could be exposed through such mapping? (But one should also consider the difference between what is technically possible and what is practically likely to happen, i.e., will there really be someone with the motive, will and ability to do these privacy threatening Web inferencing and mapping exercises in each and every case? A risks-costs-benefits assessment might help in such situations.) Although these situations may not fall foul of any HIPAA standard, nor probably concern an IRB, we are now at a point where changing geospatial technologies must stimulate debate that goes beyond the normal community participatory ethical standards used by researchers [35].
Because of the widespread adoption of GIS-light Internet applications, and cheap and easy-to-use mobile mapping devices (for example, ones which can tag pictures with coordinates), health related spatial confidentiality is now no longer the concern of only geographic information scientists, or even GIS users, but also of a far broader range of academics and other people.