A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering

Guangrong Li; Xiaohua Hu; Xiajiong Shen; Xin Chen; Zhoujun Li

doi:10.1109/GRC.2008.4664788

Back

Conference proceeding

A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering

Guangrong Li, Xiaohua Hu, Xiajiong Shen, Xin Chen and Zhoujun Li

2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2

01 Jan 2008

DOI: https://doi.org/10.1109/GRC.2008.4664788

Featured in Collection : UN Sustainable Development Goals @ Drexel

Additional Links

Abstract

Computer Science, Artificial Intelligence

Computer Science, Interdisciplinary Applications

Computer Science, Theory & Methods

Science & Technology

Computer Science

Technology

Many feature selection methods have been proposed and most of them are in the supervised learning paradigm. Recently unsupervised feature selection has attracted a lot of attention especially in bioinformatics and text mining. So far, supervised feature selection and unsupervised feature selection method are studied and developed separately. A subset selected by a supervised feature selection method may not be a good one for unsupervised learning and vice verse. In bioinformatics research, however it is very common to perform clustering and classification iteratively for the same data sets, especially in gene expression analysis, thus it is very desirable to have a feature selection method which works well for both unsupervised learning and supervised learning. In this paper we propose a novel feature selection algorithm through feature clustering. Our algorithm does not need the class label information in the data set and is suitable for both supervised learning and unsupervised learning. Our algorithm groups the features into different clusters based on feature similarity, so that the features in the same clusters are similar to each other. A representative feature is selected from each cluster, thus reduces the feature redundancy. Our feature selection algorithm uses feature similarity for feature redundancy reduction. but requires no feature search, works very well for high dimensional data set. We test our algorithm on some biological data sets for both clustering and classification analysis and the results indicates that our FSFC algorithm can significantly reduce the original data sets without scarifying the quality of clustering and classification.

Metrics

11 Record Views

19 citations in Web of Science

36 citations in Scopus

Details

Title: A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering
Creators: Guangrong Li - Wuhan Univ, Sch Comp, Wuhan, Peoples R China
Xiaohua Hu - Drexel University, Information Science
Xiajiong Shen - Henan Univ, Coll Comp & Informat Engn, Henan, Peoples R China
Xin Chen - Drexel University, Radiation Oncology (and Nuclear Medicine)
Zhoujun Li - National University of Defense Technology
Publication Details: 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2
Publisher: IEEE
Number of pages: 2
Grant note: CCF 0514679 / NSF; National Science Foundation (NSF) 239667 / PA Dept of Health IIS-0448023 / NSF Career; National Science Foundation (NSF); NSF - Office of the Director (OD)
Resource Type: Conference proceeding
Language: English
Academic Unit: Information Science; Radiation Oncology (and Nuclear Medicine)
Web of Science ID: WOS:000263829500019
Scopus ID: 2-s2.0-57949100571
Other Identifier: 991019167549404721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration; International collaboration
Web of Science research areas: Computer Science, Artificial Intelligence; Computer Science, Interdisciplinary Applications; Computer Science, Theory & Methods

A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering

Additional Links

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media