Conference proceeding
Scaling a neyman-pearson subset selection approach via heuristics for mining massive data
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, 439
01 Dec 2014
Abstract
Conference Title: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Conference Start Date: 2014, Dec. 9 Conference End Date: 2014, Dec. 12 Conference Location: Orlando, FL, USA Feature subset selection is an important step towards producing a classifier that relies only on relevant features, while keeping the computational complexity of the classifier low. Feature selection is also used in making inferences on the importance of attributes, even when classification is not the ultimate goal. For example, in bioinformatics and genomics feature subset selection is used to make inferences between the variables that best describe multiple populations. Unfortunately, many feature selection algorithms require the subset size to be specified a priori, but knowing how many variables to select is typically a nontrivial task. Other approaches rely on a specific variable subset selection framework to be used. In this work, we examine an approach to feature subset selection works with a generic variable selection algorithm, and our approach provides statistical inference on the number of features that are relevant, which may be unknown to the generic variable selection algorithm. This work extends our previous implementation of a Neyman-Pearson feature selection (NPFS) hypothesis test, which acts as a meta-subset selection algorithm. Specifically, we examine the conservativeness of the NPFS approach by biasing the hypothesis test, and examine other heuristics for NPFS. We include results from carefully designed synthetic datasets. Furthermore, we demonstrate the NPFS's ability to perform on data of a massive scale.
Metrics
4 Record Views
Details
- Title
- Scaling a neyman-pearson subset selection approach via heuristics for mining massive data
- Creators
- Gregory DitzlerMatthew AustenGail RosenRobi Polikar
- Publication Details
- The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, 439
- Publisher
- The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Identifiers
- 991019170462804721