Logo image
Normalized Compression Distance of Multisets with Applications
Journal article   Open access   Peer reviewed

Normalized Compression Distance of Multisets with Applications

Andrew R Cohen and Paul M. B Vitanyi
IEEE transactions on pattern analysis and machine intelligence, v 37(8), pp 1602-1614
01 Aug 2015
PMID: 26352998
url
https://europepmc.org/articles/pmc4566858View
Accepted (AM)Open Access (License Unspecified) Open

Abstract

Accuracy Additives classification Complexity theory data mining Educational institutions handwritten character recognition Kolmogorov complexity Measurement multisets or multiples Normalized compression distance organelle transport Pattern recognition Retina retinal progenitor cells similarity synthetic data
Pairwise normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity metric based on compression. We propose an NCD of multisets that is also metric. Previously, attempts to obtain such an NCD failed. For classification purposes it is superior to the pairwise NCD in accuracy and implementation complexity. We cover the entire trajectory from theoretical underpinning to feasible practice. It is applied to biological (stem cell, organelle transport) and OCR classification questions that were earlier treated with the pairwise NCD. With the new method we achieved significantly better results. The theoretic foundation is Kolmogorov complexity.

Metrics

19 Record Views
39 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Artificial Intelligence
Engineering, Electrical & Electronic
Logo image