Journal article
Web site topic‐hierarchy generation based on link structure
Journal of the American Society for Information Science and Technology, v 60(3), pp 495-508
Mar 2009
Abstract
Navigating through hyperlinks within a Web site to look for information from one of its Web pages without the support of a site map can be inefficient and ineffective. Although the content of a Web site is usually organized with an inherent structure like a topic hierarchy, which is a directed tree rooted at a Web site's homepage whose vertices and edges correspond to Web pages and hyperlinks, such a topic hierarchy is not always available to the user. In this work, we studied the problem of automatic generation of Web sites' topic hierarchies. We modeled a Web site's link structure as a weighted directed graph and proposed methods for estimating edge weights based on eight types of features and three learning algorithms, namely decision trees, naïve Bayes classifiers, and logistic regression. Three graph algorithms, namely breadth‐first search, shortest‐path search, and directed minimum‐spanning tree, were adapted to generate the topic hierarchy based on the graph model. We have tested the model and algorithms on real Web sites. It is found that the directed minimum‐spanning tree algorithm with the decision tree as the weight learning algorithm achieves the highest performance with an average accuracy of 91.9%.
Metrics
Details
- Title
- Web site topic‐hierarchy generation based on link structure
- Creators
- Christopher C YangNan Liu
- Publication Details
- Journal of the American Society for Information Science and Technology, v 60(3), pp 495-508
- Publisher
- Wiley Subscription Services, Inc., A Wiley Company; Hoboken
- Number of pages
- 14
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000263935100006
- Scopus ID
- 2-s2.0-62549126499
- Other Identifier
- 991014878337904721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Information Systems
- Information Science & Library Science