Logo image
Tree Labeled LDA: A Hierarchical Model for Web Summaries
Conference proceeding

Tree Labeled LDA: A Hierarchical Model for Web Summaries

Anton Slutsky, Xiaohua Hu and Yuan An
2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, pp 134-140
01 Jan 2013

Abstract

Computer Science, Information Systems Computer Science, Theory & Methods Engineering, Electrical & Electronic Science & Technology Computer Science Engineering Technology
We study the applications of hierarchical topic models to represent the content of website summaries. We concentrate on the DMOZ collection of Web extracts and propose a novel Tree Labeled LDA (tLLDA) algorithm to infer topic models using its manually compiled ontology. The algorithm takes advantage of the ontology structure and infers topic models by jointly modeling word and ontology node assignments for documents. We evaluate the performance of our topic modeling approach against that of four state-of-the-art algorithms (Labeled LDA, Hierarchically Labeled LDA, Hierarchically Supervised LDA and Supervised LDA) and show improvement in terms of perplexity and accuracy. Our evaluation shows that topic models produced by tLLDA outperform other algorithms in terms of perplexity for all test sets and all but one test case in terms of accuracy.

Metrics

4 Record Views
7 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Information Systems
Computer Science, Theory & Methods
Engineering, Electrical & Electronic
Logo image