Generalized and Heuristic-Free Feature Construction for Improved Accuracy
Document Type
Paper
Publication Date
12-1-2010
Abstract
State-of-the-art learning algorithms accept data in feature vector format as input. Examples belonging to different classes may not always be easy to separate in the original feature space. One may ask: can transformation of existing features into new space reveal significant discriminative information not obvious in the original space? Since there can be infinite number of ways to extend features, it is impractical to first enumerate and then perform feature selection. Second, evaluation of discriminative power on the complete dataset is not always optimal. This is because features highly discriminative on subset of examples may not necessarily be significant when evaluated on the entire dataset. Third, feature construction ought to be automated and general, such that, it doesn't require domain knowledge and its improved accuracy maintains over a large number of classification algorithms. In this paper, we propose a framework to address these problems through the following steps: (1) divide-conquer to avoid exhaustive enumeration; (2) local feature construction and evaluation within subspaces of examples where local error is still high and constructed features thus far still do not predict well; (3) weighting rules based search that is domain knowledge free and has provable performance guarantee. Empirical studies indicate that significant improvement (as much as 9% in accuracy and 28% in AUC) is achieved using the newly constructed features over a variety of inductive learners evaluated against a number of balanced, skewed and high-dimensional datasets. Software and datasets are available from the authors.
Montclair State University Digital Commons Citation
Fan, Wei; Zhong, Erheng; Peng, Jing; Verscheure, Olivier; Zhang, Kun; Ren, Jiangtao; Yan, Rong; and Yang, Qiang, "Generalized and Heuristic-Free Feature Construction for Improved Accuracy" (2010). Department of Computer Science Faculty Scholarship and Creative Works. 300.
https://digitalcommons.montclair.edu/compusci-facpubs/300