Example | Attribute association | Circumstance | How AAEP was estimated |
---|---|---|---|
1 | 1. Patient identifying fields | Patient lacks government issued ID and address, and patient names and date of birth match to three individuals in external data | The patient could be three different persons matched to in various sources. The AAEP is calculated as 1 − (1/3) or 0.666 |
2 | 2. Patient-date of diagnosis | Patient diagnosis year known, but month and day unknown | One day out of 365 is chosen, thus the probability of choosing the wrong day is 1/365. AAEP = 1 − (1/365) = 0.997 |
3 | 3. Patient: date of diagnosis-address | Patient address is missing house number | Patient address matches to the address featuresa of 20 residences on one street. AAEP = 1 − (1/20), or 0.95 |
4 | 3. Patient: date of diagnosis-address | Patient address is missing prefix direction. To confirm that address is valid, it is matched to USPS ZIP + 4 database | Patient address matches to 2 addresses in USPS ZIP + 4 database. AAEP = 1 − (1/2) or 0.5 |
5 | 3. Patient: date of diagnosis-address | Error suspected in more than one component of patient address (‘multimatch’ address). Patient address can be matched to 12 different address features in geographic reference data depending on which address component(s) are edited | AAEP = 1 − (1/12) or 0.916 |
6 | 3. Patient: date of diagnosis-address | Address at diagnosis cannot be geocoded. Patient address history unknown or incomplete. Patient address identified via linkage to external source on patient name and date of birth, and used to match to geographic reference data with one to one match. Date of diagnosis was not spanned by duration of address validity in external data source | AAEP estimated at 0.25 based on best available information about error rate of external data source |
7 | 3. Patient: date of diagnosis-address | Patient has PO Box address. Patient address history unknown or incomplete. Patient names and PO Box address match to owner names and mailing addresses of 4 parcels, whose sale dates precede the date of diagnosis | There are 4 possible addresses and only one is chosen. AAEP = 1 − (1/4) = 0.75 |
8 | 3. Patient: date of diagnosis-address | Patient year of diagnosis known. Patient day and month of diagnosis is unknown. Patient address history for year of diagnosis is known. During that time patient lived at 3 addresses in sequence for 0.4, 0.1, and 0.5 % of the year; the first address is chosen | AAEP = 1 − 0.4 or 0.6 % |
9 | 4. Patient: date of diagnosis-address-geocode | Patient address matches to a street in geographic reference data with 21 address features that are missing house numbers | Patient address matches to address features of 21 residences on one street. AAEP = 1 − (1/21), or 0.952 |
10 | 4. Patient-date of diagnosis-address-geocode | Patient street address could not be matched to street level geographic reference data. Patient postal code matched to postal code area centroid | Postal code encompasses 13,500 address features. AAEP = 1 − (1/13,500) = 0.999 |
11 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Patient address lacks a house number. Street to which patient address is geocoded is contained within 1 enumeration area | Because all potential matches are contained within the chosen enumeration area, AAEPÂ =Â 0 |
12 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Patient address lacks a house number; there are 70 address feature matching candidates. The area of uncertainty that contains the potential matches spans 2 enumeration areas. These contain 20 and 50 candidate address features; the latter enumeration area is chosen | AAEP = 1 − (50/70) = 0.285 |
13 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Patient street address could not be matched to street level geographic reference data. Patient postal code matched to postal code area centroid. Postal code area spans 4 enumeration areas, which contain 2160, 1620, 1620 and 6750 address features respectively; the latter enumeration area is chosen | Postal code encompasses 12,150 address features. AAEP is 1 − (6750/12,150) = 0.444 |
14 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Both patient address and postal code are unmatched in geographic reference data. County centroid is assigned as geocode | AAEP is 1 − (1/395,909 address features in county) or 0.999 |