Logo image
A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering
Conference proceeding

A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering

Guangrong Li, Xiaohua Hu, Xiajiong Shen, Xin Chen and Zhoujun Li
2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2
01 Jan 2008

Abstract

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications Computer Science, Theory & Methods Science & Technology Computer Science Technology
Many feature selection methods have been proposed and most of them are in the supervised learning paradigm. Recently unsupervised feature selection has attracted a lot of attention especially in bioinformatics and text mining. So far, supervised feature selection and unsupervised feature selection method are studied and developed separately. A subset selected by a supervised feature selection method may not be a good one for unsupervised learning and vice verse. In bioinformatics research, however it is very common to perform clustering and classification iteratively for the same data sets, especially in gene expression analysis, thus it is very desirable to have a feature selection method which works well for both unsupervised learning and supervised learning. In this paper we propose a novel feature selection algorithm through feature clustering. Our algorithm does not need the class label information in the data set and is suitable for both supervised learning and unsupervised learning. Our algorithm groups the features into different clusters based on feature similarity, so that the features in the same clusters are similar to each other. A representative feature is selected from each cluster, thus reduces the feature redundancy. Our feature selection algorithm uses feature similarity for feature redundancy reduction. but requires no feature search, works very well for high dimensional data set. We test our algorithm on some biological data sets for both clustering and classification analysis and the results indicates that our FSFC algorithm can significantly reduce the original data sets without scarifying the quality of clustering and classification.

Metrics

11 Record Views
36 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Interdisciplinary Applications
Computer Science, Theory & Methods
Logo image