Conference proceeding
A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering
2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2
01 Jan 2008
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
Many feature selection methods have been proposed and most of them are in the supervised learning paradigm. Recently unsupervised feature selection has attracted a lot of attention especially in bioinformatics and text mining. So far, supervised feature selection and unsupervised feature selection method are studied and developed separately. A subset selected by a supervised feature selection method may not be a good one for unsupervised learning and vice verse. In bioinformatics research, however it is very common to perform clustering and classification iteratively for the same data sets, especially in gene expression analysis, thus it is very desirable to have a feature selection method which works well for both unsupervised learning and supervised learning. In this paper we propose a novel feature selection algorithm through feature clustering. Our algorithm does not need the class label information in the data set and is suitable for both supervised learning and unsupervised learning. Our algorithm groups the features into different clusters based on feature similarity, so that the features in the same clusters are similar to each other. A representative feature is selected from each cluster, thus reduces the feature redundancy. Our feature selection algorithm uses feature similarity for feature redundancy reduction. but requires no feature search, works very well for high dimensional data set. We test our algorithm on some biological data sets for both clustering and classification analysis and the results indicates that our FSFC algorithm can significantly reduce the original data sets without scarifying the quality of clustering and classification.
Metrics
Details
- Title
- A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering
- Creators
- Guangrong Li - Wuhan Univ, Sch Comp, Wuhan, Peoples R ChinaXiaohua Hu - Drexel University, Information ScienceXiajiong Shen - Henan Univ, Coll Comp & Informat Engn, Henan, Peoples R ChinaXin Chen - Drexel University, Radiation Oncology (and Nuclear Medicine)Zhoujun Li - National University of Defense Technology
- Publication Details
- 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2
- Publisher
- IEEE
- Number of pages
- 2
- Grant note
- CCF 0514679 / NSF; National Science Foundation (NSF) 239667 / PA Dept of Health IIS-0448023 / NSF Career; National Science Foundation (NSF); NSF - Office of the Director (OD)
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science; Radiation Oncology (and Nuclear Medicine)
- Web of Science ID
- WOS:000263829500019
- Scopus ID
- 2-s2.0-57949100571
- Other Identifier
- 991019167549404721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Interdisciplinary Applications
- Computer Science, Theory & Methods