Conference proceeding
Learning Focused Hierarchical Topic Models with Semi-Supervision in Microblogs
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, v 9078, pp 598-609
01 Jan 2015
Abstract
Topic modeling approaches, such as Latent Dirichlet Allocation (LDA) and Hierarchical LDA (hLDA) have been used extensively to discover topics in various corpora. Unfortunately, these approaches do not perform well when applied to collections of social media posts. Further, these approaches do not allow users to focus topic discovery around subjectively interesting concepts. We propose the new Semi-Supervised Microblog-hLDA (SS-Micro-hLDA) model to discover topic hierarchies in short, noisy microblog documents in a way that allows users to focus topic discovery around interesting areas. We test SS-Micro-hLDA using a large, public collection of Twitter messages and Reddit social blogging site and show that our model outperforms hLDA, Constrained-hLDA, Recursive-rCRP and TSSB in terms of Pointwise Mutual Information (PMI) Score. Further, we test our model in terms of information entropy of held-out data and show that the new approach produces highly focused topic hierarchies.
Metrics
Details
- Title
- Learning Focused Hierarchical Topic Models with Semi-Supervision in Microblogs
- Creators
- Anton Slutsky - Drexel UniversityXiaohua Hu - Drexel UniversityYuan An - Drexel University
- Contributors
- T Cao (Editor)E P Lim (Editor)Z H Zhou (Editor)T B Ho (Editor)D Cheung (Editor)H Motoda (Editor)
- Publication Details
- ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, v 9078, pp 598-609
- Series
- Lecture Notes in Artificial Intelligence
- Publisher
- Springer Nature
- Number of pages
- 12
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000361909900047
- Scopus ID
- 2-s2.0-84945580276
- Other Identifier
- 991019167319304721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Information Systems
- Computer Science, Theory & Methods