Hybrid Feature Selection Methods for High-Dimensional Multi-Class Datasets
Journal / Book Title
Journal of Data Mining, Modelling and Management
Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.
MSU Digital Commons Citation
Saxena, Amit Kumar; Dubey, Vimal Kumar; and Wang, John, "Hybrid Feature Selection Methods for High-Dimensional Multi-Class Datasets" (2017). Department of Information Management and Business Analytics Faculty Scholarship and Creative Works. 83.
Saxena, A. K., Dubey, V. K., & Wang, J. (2017). Hybrid feature selection methods for high-dimensional multi-class datasets. International Journal of Data Mining, Modelling and Management, 9(4), 315-339.