Conference proceeding
Extracting a website's content structure from its link structure
Proceedings of the 14th ACM international conference on Information and knowledge management, pp 345-346
31 Oct 2005
Abstract
Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we propose an algorithm for extracting a Website's topic hierarchy from its link structure. The proposed algorithm consists of a construction stage and a refining stage, in which we analyze the semantic relationships between web pages based on link structure, web page content and directory structure. We've done extensive experiments using different Websites and obtained very promising results.
Metrics
7 Record Views
Details
- Title
- Extracting a website's content structure from its link structure
- Creators
- Nan Liu - Chinese University of Hong KongChristopher C. Yang - Chinese University of Hong Kong
- Publication Details
- Proceedings of the 14th ACM international conference on Information and knowledge management, pp 345-346
- Conference
- CIKM05: Conference on Information and Knowledge Management (Bremen, Germany, 31 Oct 2005–05 Nov 2005)
- Series
- ACM Conferences
- Publisher
- ACM
- Number of pages
- 2
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science (Informatics)
- Other Identifier
- 991021861112804721