TY - JOUR
T1 - Less is more
T2 - optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application
AU - Georganos, Stefanos
AU - Grippa, Tais
AU - Vanhuysse, Sabine
AU - Lennert, Moritz
AU - Shimoni, Michal
AU - Kalogirou, Stamatis
AU - Wolff, Eleonore
N1 - Publisher Copyright:
© 2017 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2018/3/4
Y1 - 2018/3/4
N2 - This study evaluates the impact of four feature selection (FS) algorithms in an object-based image analysis framework for very-high-resolution land use-land cover classification. The selected FS algorithms, correlation-based feature selection, mean decrease in accuracy, random forest (RF) based recursive feature elimination, and variable selection using random forest, were tested on the extreme gradient boosting, support vector machine, K-nearest neighbor, RF, and recursive partitioningclassifiers, respectively. The results demonstrate that the selection of an appropriate FS method can be crucial to the performance of a machine learning classifier in terms of accuracy but also parsimony. In this scope, we propose a new metric to perform model selection named classification optimization score (COS) that rewards model simplicity and indirectly penalizes for increased computational time and processing requirements using the number of features for a given classification model as a surrogate. Our findings suggest that applying rigorous FS along with utilizing the COS metric may significantly reduce the processing time and the storage space while at the same time producing higher classification accuracy than using the initial dataset.
AB - This study evaluates the impact of four feature selection (FS) algorithms in an object-based image analysis framework for very-high-resolution land use-land cover classification. The selected FS algorithms, correlation-based feature selection, mean decrease in accuracy, random forest (RF) based recursive feature elimination, and variable selection using random forest, were tested on the extreme gradient boosting, support vector machine, K-nearest neighbor, RF, and recursive partitioningclassifiers, respectively. The results demonstrate that the selection of an appropriate FS method can be crucial to the performance of a machine learning classifier in terms of accuracy but also parsimony. In this scope, we propose a new metric to perform model selection named classification optimization score (COS) that rewards model simplicity and indirectly penalizes for increased computational time and processing requirements using the number of features for a given classification model as a surrogate. Our findings suggest that applying rigorous FS along with utilizing the COS metric may significantly reduce the processing time and the storage space while at the same time producing higher classification accuracy than using the initial dataset.
KW - OBIA
KW - extreme gradient boosting
KW - feature selection
KW - land cover classification
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85035758443&partnerID=8YFLogxK
U2 - 10.1080/15481603.2017.1408892
DO - 10.1080/15481603.2017.1408892
M3 - Article
AN - SCOPUS:85035758443
SN - 1548-1603
VL - 55
SP - 221
EP - 242
JO - GIScience and Remote Sensing
JF - GIScience and Remote Sensing
IS - 2
ER -