Logo image
Towards effective document clustering: A constrained K-means based approach
Journal article   Peer reviewed

Towards effective document clustering: A constrained K-means based approach

Guobiao Hu, Shuigeng Zhou, Jihong Guan and Xiaohua Hu
Information processing & management, v 44(4), pp 1397-1409
2008

Abstract

Clustering with prior knowledge Document clustering Semi-supervised learning Spectral relaxation
Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K- means. Then,the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.

Metrics

17 Record Views
55 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Information Systems
Information Science & Library Science
Logo image