Conference proceeding
Dragon toolkit: Incorporating auto-learned semantic knowledge into large-scale text retrieval and mining
19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, v 2, pp 197-201
01 Jan 2007
Abstract
The majority of text retrieval and mining techniques are still based on exact feature (e.g. words) matching and unable to incorporate text semantics. Many researchers believe that the extension with semantic knowledge could improve the results and various methods (most of them are heuristic) have been proposed to account for concept hierarchy, synonymy, and other semantic relationships. However, the results with such semantic extension have been mixed, ranging from slight improvements to decreases in effectiveness, mostly likely due to the lack of a formal framework. Instead, we propose a novel method to address the semantic extension within the framework of language modeling. Our method extracts explicit topic signatures ftom documents and then statistically maps them into single-word features. The incorporation of semantic knowledge then reduces to the smoothing of unigram language models using semantic knowledge. The dragon toolkit reflects our method and its effectiveness is demonstrated by three tasks, text retrieval, text classification, and text clustering.
Metrics
Details
- Title
- Dragon toolkit: Incorporating auto-learned semantic knowledge into large-scale text retrieval and mining
- Creators
- Xiaohua Zhou - Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USAXiaodan Zhang - Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USAXiaohua Hu - Drexel UniversityIEEE Comp Soc
- Publication Details
- 19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, v 2, pp 197-201
- Series
- Proceedings-International Conference on Tools With Artificial Intelligence
- Publisher
- IEEE
- Number of pages
- 5
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000253293000032
- Scopus ID
- 2-s2.0-48649088857
- Other Identifier
- 991019167327004721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Hardware & Architecture
- Engineering, Electrical & Electronic