Residential scene classification for gridded population sampling in developing countries using deep convolutional neural networks on satellite imagery

Table 4 Model evaluation metrics for the Nigeria and Guatemala test sets

Model	Type	Acc.	Prec.	Recall	F1
Nigeria
Baseline CNN	Deep	88.9%	89.2%	88.9%	89.0%
VGG16 with ImageNet weights	Deep	93.4%	93.4%	93.4%	93.3%
InceptionV3 with ImageNet weights	Deep	93.6%	93.6%	93.6%	93.6%
VGG16 and InceptionV3 ensemble	Deep	94.5%	94.5%	94.5%	94.5%
Decision Tree	Shallow	80.3%	80.9%	80.3%	78.9%
Gradient Boosting	Shallow	80.3%	80.9%	80.3%	79.0%
AdaBoost	Shallow	80.6%	81.8%	80.6%	79.2%
Random forest	Shallow	80.1%	80.7%	80.1%	78.8%
Logistic regression	Shallow	80.6%	81.8%	80.6%	79.2%
Support vector machine	Shallow	79.9%	81.5%	79.9%	78.1%
K-nearest neighbors	Shallow	75.6%	81.3%	75.6%	71.3%
Human benchmark	Human	91.0%*	–	–	–
Guatemala
Baseline CNN	Deep	93.3%	93.3%	93.3%	93.3%
VGG16 with ImageNet weights	Deep	96.4%	96.7%	96.4%	96.5%
Inception V3 with ImageNet weights	Deep	95.6%	95.9%	95.6%	95.6%
VGG16 and InceptionV3 ensemble	Deep	96.4%	96.7%	96.4%	96.5%
Decision tree	Shallow	93.8%	94.1%	93.8%	93.8%
Gradient boosting	Shallow	93.8%	94.1%	93.8%	93.8%
AdaBoost	Shallow	92.9%	93.1%	92.9%	93.0%
Random forest	Shallow	93.8%	94.1%	93.8%	93.8%
Logistic regression	Shallow	93.8%	94.1%	93.8%	93.8%
Support vector machine	Shallow	93.8%	94.6%	93.8%	93.9%
K-nearest neighbors	Shallow	92.4%	93.7%	92.4%	92.6%
Human benchmark	Human	97.1%*	–	–	–