Conference proceeding
A new decision tree classification method for mining high-speed data streams based on threaded binary search trees
EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, v 4819, pp 256-267
01 Jan 2007
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
One of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system VFDTt on top of VFDT and VFDTc. We make the following three contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDTs processing time is O(n(2)). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but VFDTt just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, VFDTt's candidate split-test number decrease from O(n) to O(logn).Comparing to VFDT, the most relevant property of our system is an average reduction of 25.53% in processing time, while keep the same tree size and accuracy. Overall, the techniques introduced here significantly improve the efficiency of decision tree classification on data streams.
Metrics
Details
- Title
- A new decision tree classification method for mining high-speed data streams based on threaded binary search trees
- Creators
- Tao Wang - National University of Defense TechnologyZhoujun Li - Beihang UniversityXiaohua Hu - Drexel UniversityYuejin Yan - National University of Defense TechnologyHuowang Chen - National University of Defense Technology
- Contributors
- T Washio (Editor)
- Publication Details
- EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, v 4819, pp 256-267
- Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature
- Number of pages
- 3
- Grant note
- 60573057; 60473057; 90604007 / National Science Foundation of China; National Natural Science Foundation of China (NSFC)
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science (Informatics)
- Web of Science ID
- WOS:000252728700027
- Scopus ID
- 2-s2.0-38549125404
- Other Identifier
- 991019170319904721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Information Systems