Logo image
Dragon toolkit: Incorporating auto-learned semantic knowledge into large-scale text retrieval and mining
Conference proceeding

Dragon toolkit: Incorporating auto-learned semantic knowledge into large-scale text retrieval and mining

Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu and IEEE Comp Soc
19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, v 2, pp 197-201
01 Jan 2007

Abstract

Computer Science Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Engineering Engineering, Electrical & Electronic Science & Technology Technology
The majority of text retrieval and mining techniques are still based on exact feature (e.g. words) matching and unable to incorporate text semantics. Many researchers believe that the extension with semantic knowledge could improve the results and various methods (most of them are heuristic) have been proposed to account for concept hierarchy, synonymy, and other semantic relationships. However, the results with such semantic extension have been mixed, ranging from slight improvements to decreases in effectiveness, mostly likely due to the lack of a formal framework. Instead, we propose a novel method to address the semantic extension within the framework of language modeling. Our method extracts explicit topic signatures ftom documents and then statistically maps them into single-word features. The incorporation of semantic knowledge then reduces to the smoothing of unigram language models using semantic knowledge. The dragon toolkit reflects our method and its effectiveness is demonstrated by three tasks, text retrieval, text classification, and text clustering.

Metrics

22 Record Views
40 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Hardware & Architecture
Engineering, Electrical & Electronic
Logo image