Logo image
On the accuracy of linear regression routines in some data mining packages
Journal article   Peer reviewed

On the accuracy of linear regression routines in some data mining packages

B. D. McCullough, Taha Mokfi and Mahsa Almaeenejad
Wiley interdisciplinary reviews. Data mining and knowledge discovery, v 9(3), pn/a
01 May 2019

Abstract

Computer Science Computer Science, Artificial Intelligence Computer Science, Theory & Methods Science & Technology Technology
While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted. This article is categorized under: Technologies > Statistical Fundamentals Algorithmic Development > Statistics Application Areas > Data Mining Software Tools

Metrics

8 Record Views
4 citations in Scopus
26 readers on Mendeley
1 readers on CiteULike

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#4 Quality Education

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Theory & Methods
Logo image