Conference proceeding
A comparative evaluation of different link types on enhancing document clustering
Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 555-562
20 Jul 2008
Abstract
With a growing number of works utilizing link information in enhancing document clustering, it becomes necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various types of links between text documents, including explicit links such as citation links and hyperlinks, implicit links such as co-authorship links, and pseudo links such as content similarity links, convey topic similarity or topic transferring patterns, which is very useful for document clustering. In this study, we adopt a Relaxation Labeling (RL)-based clustering algorithm, which employs both content and linkage information, to evaluate the effectiveness of the aforementioned types of links for document clustering on eight datasets. The experimental results show that linkage is quite effective in improving content-based document clustering. Furthermore, a series of interesting findings regarding the impacts of different link types on document clustering are discovered through our experiments.
Metrics
15 Record Views
28 citations in Scopus
Details
- Title
- A comparative evaluation of different link types on enhancing document clustering
- Creators
- Xiaodan Zhang - Drexel UniversityXiaohua Hu - Drexel UniversityXiaohua Zhou - Drexel University
- Publication Details
- Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 555-562
- Conference
- 31st annual international ACM SIGIR conference on research and development in information retrieval, 31st
- Series
- SIGIR '08
- Publisher
- Association for Computing Machinery (ACM)
- Number of pages
- 1
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science
- Scopus ID
- 2-s2.0-57549085945
- Other Identifier
- 991019173435904721