Skip to main content

Table 1 Comparison of mainstream location inferring algorithms

From: Location inference for hidden population with online text analysis

  The Gazetteer-based Method Part-of-speech (POS) Tagging Named Entity Recognition (NER)
Features Identifying geographical names according to external location knowledge (e.g., dictionary containing names of cities and states) Recognizing geographical terms in a corpus based on the part of speech of its component words, according to both their definitions and contexts Identifying and classifying words mentioned in unstructured corpus as pre-defined entity classes, i.e., persons, locations, organizations, etc. based on HMM models
Strengths It is a popular approach when looking for locations in Web text [45]; The algorithm is simple and easy to implement Part-of-speech information is a pre-requisite in many NLP (Natural Language Processing) algorithms The algorithm is fast, and suitable for processing large-scale datasets
Limitations Largely relies on the gazetteer, and easily affected by external geographic databases [46,47,48] Vulnerable to linguistic errors and idiosyncratic style [38];
Algorithm accuracy is relatively low
Cannot identify names of local streets or buildings, non-standard place abbreviations and misspellings which are common in microtext