Journal article
On the accuracy of linear regression routines in some data mining packages
Wiley interdisciplinary reviews. Data mining and knowledge discovery, v 9(3), pn/a
01 May 2019
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted. This article is categorized under: Technologies > Statistical Fundamentals Algorithmic Development > Statistics Application Areas > Data Mining Software Tools
Metrics
Details
- Title
- On the accuracy of linear regression routines in some data mining packages
- Creators
- B. D. McCullough - Department of Decision Sciences and MIS LeBow College of Business, Drexel University Philadelphia PennsylvaniaTaha Mokfi - University of Central FloridaMahsa Almaeenejad - University of Central Florida
- Publication Details
- Wiley interdisciplinary reviews. Data mining and knowledge discovery, v 9(3), pn/a
- Publisher
- Wiley
- Number of pages
- 9
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Decision Sciences (and Management Information Systems)
- Web of Science ID
- WOS:000466434300004
- Scopus ID
- 2-s2.0-85053180554
- Other Identifier
- 991019169646104721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Theory & Methods