Department of Computer Science Faculty Scholarship and Creative Works

Generalized and Heuristic-Free Feature Construction for Improved Accuracy

Wei Fan, TencentFollow
Erheng Zhong, Sun Yat-Sen UniversityFollow
Jing Peng, Montclair State UniversityFollow
Olivier Verscheure, Hong Kong University of Science and TechnologyFollow
Kun Zhang, Xavier University of LouisianaFollow
Jiangtao Ren, Sun Yat-Sen University
Rong Yan, IBM
Qiang Yang, Hong Kong University of Science and Technology

Document Type

Paper

Publication Date

12-1-2010

Abstract

State-of-the-art learning algorithms accept data in feature vector format as input. Examples belonging to different classes may not always be easy to separate in the original feature space. One may ask: can transformation of existing features into new space reveal significant discriminative information not obvious in the original space? Since there can be infinite number of ways to extend features, it is impractical to first enumerate and then perform feature selection. Second, evaluation of discriminative power on the complete dataset is not always optimal. This is because features highly discriminative on subset of examples may not necessarily be significant when evaluated on the entire dataset. Third, feature construction ought to be automated and general, such that, it doesn't require domain knowledge and its improved accuracy maintains over a large number of classification algorithms. In this paper, we propose a framework to address these problems through the following steps: (1) divide-conquer to avoid exhaustive enumeration; (2) local feature construction and evaluation within subspaces of examples where local error is still high and constructed features thus far still do not predict well; (3) weighting rules based search that is domain knowledge free and has provable performance guarantee. Empirical studies indicate that significant improvement (as much as 9% in accuracy and 28% in AUC) is achieved using the newly constructed features over a variety of inductive learners evaluated against a number of balanced, skewed and high-dimensional datasets. Software and datasets are available from the authors.

Montclair State University Digital Commons Citation

Fan, Wei; Zhong, Erheng; Peng, Jing; Verscheure, Olivier; Zhang, Kun; Ren, Jiangtao; Yan, Rong; and Yang, Qiang, "Generalized and Heuristic-Free Feature Construction for Improved Accuracy" (2010). Department of Computer Science Faculty Scholarship and Creative Works. 300.
https://digitalcommons.montclair.edu/compusci-facpubs/300

This document is currently not available here.

COinS

Department of Computer Science Faculty Scholarship and Creative Works

Generalized and Heuristic-Free Feature Construction for Improved Accuracy

Document Type

Publication Date

Abstract

Montclair State University Digital Commons Citation

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

Generalized and Heuristic-Free Feature Construction for Improved Accuracy

Authors

Document Type

Publication Date

Abstract

Montclair State University Digital Commons Citation

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>