Document Type

Article

Publication Date

1-1-2024

Journal / Book Title

International Journal of Data Warehousing and Mining

Abstract

This paper presents a two-stage feature selection scheme using machine learning techniques. In the first stage a wrapper method is adopted to select various combinations of subsets of features from the original dataset. The performance of the model is evaluated by three classifiers: K-Nearest Neighbor (KNN), Support Vector Machines (SVM), and Random Forest (RF). In the second and final stage, a sequential backward feature selection Method is applied. The proposed method is demonstrated on eighteen datasets and the average classification accuracy of eighteen datasets achieved is 89.81%, 87.55%, and 89.82% using the KNN, SVM, and RF classifiers, respectively with a maximum reduced size of the subset being ten only. Comparing the proposed method to eight other feature selection methods, the former achieves better classification accuracy in terms of selecting the most useful but a smaller number of features.

DOI

10.4018/IJDWM.352041

Rights

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/bync-nd/4.0/).

Published Citation

Dag, A. Z., Johnson, M., Kibis, E., Simsek, S., Cankaya, B., & Delen, D. (2023). A machine learning decision support system for determining the primary factors impacting cancer survival and their temporal effect. Healthcare Analytics, 4, 100263.

Share

COinS