Conference proceeding
Tree Labeled LDA: A Hierarchical Model for Web Summaries
2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, pp 134-140
01 Jan 2013
Abstract
We study the applications of hierarchical topic models to represent the content of website summaries. We concentrate on the DMOZ collection of Web extracts and propose a novel Tree Labeled LDA (tLLDA) algorithm to infer topic models using its manually compiled ontology. The algorithm takes advantage of the ontology structure and infers topic models by jointly modeling word and ontology node assignments for documents. We evaluate the performance of our topic modeling approach against that of four state-of-the-art algorithms (Labeled LDA, Hierarchically Labeled LDA, Hierarchically Supervised LDA and Supervised LDA) and show improvement in terms of perplexity and accuracy. Our evaluation shows that topic models produced by tLLDA outperform other algorithms in terms of perplexity for all test sets and all but one test case in terms of accuracy.
Metrics
Details
- Title
- Tree Labeled LDA: A Hierarchical Model for Web Summaries
- Creators
- Anton Slutsky - Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USAXiaohua Hu - Drexel UniversityYuan An - Drexel University, Information Science
- Publication Details
- 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, pp 134-140
- Series
- IEEE International Conference on Big Data
- Publisher
- IEEE
- Number of pages
- 7
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000330831300202
- Scopus ID
- 2-s2.0-84893222669
- Other Identifier
- 991019170323104721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Information Systems
- Computer Science, Theory & Methods
- Engineering, Electrical & Electronic