On the accuracy of linear regression routines in some data mining packages

B. D. McCullough; Taha Mokfi; Mahsa Almaeenejad

doi:10.1002/widm.1279

Back

On the accuracy of linear regression routines in some data mining packages

Journal article

Peer reviewed

On the accuracy of linear regression routines in some data mining packages

B. D. McCullough, Taha Mokfi and Mahsa Almaeenejad

Wiley interdisciplinary reviews. Data mining and knowledge discovery, v 9(3), pn/a

01 May 2019

DOI: https://doi.org/10.1002/widm.1279

Featured in Collection : UN Sustainable Development Goals @ Drexel

Additional Links

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Theory & Methods

Science & Technology

Technology

While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted. This article is categorized under: Technologies > Statistical Fundamentals Algorithmic Development > Statistics Application Areas > Data Mining Software Tools

Metrics

8 Record Views

4 citations in Web of Science

4 citations in Scopus

See more details

Details

Title: On the accuracy of linear regression routines in some data mining packages
Creators: B. D. McCullough - Department of Decision Sciences and MIS LeBow College of Business, Drexel University Philadelphia Pennsylvania
Taha Mokfi - University of Central Florida
Mahsa Almaeenejad - University of Central Florida
Publication Details: Wiley interdisciplinary reviews. Data mining and knowledge discovery, v 9(3), pn/a
Publisher: Wiley
Number of pages: 9
Resource Type: Journal article
Language: English
Academic Unit: Decision Sciences (and Management Information Systems)
Web of Science ID: WOS:000466434300004
Scopus ID: 2-s2.0-85053180554
Other Identifier: 991019169646104721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration
Web of Science research areas: Computer Science, Artificial Intelligence; Computer Science, Theory & Methods

On the accuracy of linear regression routines in some data mining packages

Additional Links

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media