Accuracy Additives classification Complexity theory data mining Educational institutions handwritten character recognition Kolmogorov complexity Measurement multisets or multiples Normalized compression distance organelle transport Pattern recognition Retina retinal progenitor cells similarity synthetic data
Pairwise normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity metric based on compression. We propose an NCD of multisets that is also metric. Previously, attempts to obtain such an NCD failed. For classification purposes it is superior to the pairwise NCD in accuracy and implementation complexity. We cover the entire trajectory from theoretical underpinning to feasible practice. It is applied to biological (stem cell, organelle transport) and OCR classification questions that were earlier treated with the pairwise NCD. With the new method we achieved significantly better results. The theoretic foundation is Kolmogorov complexity.
Normalized Compression Distance of Multisets with Applications
Creators
Andrew R Cohen - Drexel University
Paul M. B Vitanyi - National Research Center for Mathematics and Computer Science, The Netherlands
Publication Details
IEEE transactions on pattern analysis and machine intelligence, v 37(8), pp 1602-1614
Publisher
IEEE
Grant note
R01AG041861 / National Institutes of Health (10.13039/100000002)
R01NS076709 / Drexel University (10.13039/100006497)
National Institute Of Neurological Disorders and Stroke (10.13039/100000065)
RGP0060/2012 / Human Frontier Science Program (10.13039/100004412)
R01AG041861 / National Institute On Aging (10.13039/100000049)
Resource Type
Journal article
Language
English
Academic Unit
Electrical and Computer Engineering
Web of Science ID
WOS:000357591900006
Scopus ID
2-s2.0-84947747978
Other Identifier
991019168437104721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: