Automatic recommendation of feature selection algorithms based on dataset characteristics

2021
Feature selection in real-world data mining problems is essential to make the learning task efficient and more accurate. Identifying the best feature selection algorithm, among the many available, is a complex activity that still relies heavily on human experts or some random trial-and-error procedure. Thus, the automated machine learning community has taken some steps towards the automation of this process. In this paper, we address the metalearning challenge of recommending feature selection algorithms by proposing a novel meta-feature engineering model. Our model considers a broad collection of meta-features that enable the study of the relationship between the dataset properties and the feature selection algorithm performance in terms of several criteria. We arrange the input meta-features into eight categories: (i) simple, (ii) statistical, (iii) information-theoretical, (iv) complexity, (v) landmarking, (vi) based on symbolic models, (vii) based on images, and (viii) based on complex networks (graphs). The target meta-features emerge from a multi-criteria performance measure, based on five individual performance indexes, that assesses feature selection methods grounded in information, distance, dependence, consistency, and precision measures. We evaluate our proposal using a recently developed framework that extracts the input meta-features from 213 benchmark datasets, and ranks the assessed feature selection algorithms, to fill in the target meta-features in meta-bases. This evaluation uses five state-of-the-art classification methods to induce recommendation models from meta-bases: C4.5, Random Forest, XGBoost, ANN, and SVM. The results showed that it is possible to reach an average accuracy of up to 90% applying our meta-feature engineering model. This work is the first to use an extensive empirical evaluation to provide a careful discussion of the strengths and limitations of more than 160 meta -features. These meta-features, while designed to aid the task of feature selection algorithm recommendation, can readily be employed in other metalearning scenarios. Therefore, we believe our findings are a valuable contribution to the fields of automated machine learning and data mining, as well as to the feature extraction and pattern recognition communities.
EXPERT SYSTEMS WITH APPLICATIONS
卷号:185
ISSN:0957-4174
收录类型
SSCI
发表日期
2021
学科领域
循证管理学
国家
巴西
语种
英语
DOI
10.1016/j.eswa.2021.115589
EISSN
1873-6793
资助机构
Brazilian National Council for Scientific and Technological DevelopmentConselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPQ) [140159/2017-7, 142050/2019-9]; Araucaria FoundationFundacao Araucaria de Apoio ao Desenvolvimento Cientifico e Tecnologico do Estado do Parana FA [028/2019]
资助信息
This work was financed in part by the Brazilian National Council for Scientific and Technological Development [grant numbers 140159/2017-7 and 142050/2019-9] , and the Araucaria Foundation for the Support of the Scientific and Technological Development of Parana through a Research and Technological Productivity Scholarship for H. D. Lee [grant number 028/2019] .
被引频次(WOS)
1
被引更新日期
2022-01
来源机构
Universidade de Sao Paulo Centro Universitario La Salle Universidade Estadual de Campinas
关键词
Feature engineering Characterization measures Algorithm selection Recommendation system Filter Wrapper