Journal article
Using a novel clumpiness measure to unite data with metadata: Finding common sequence patterns in immune receptor germline V genes
Pattern recognition letters, v 74, pp 24-29
15 Apr 2016
Abstract
When finding relationships in biological systems, we often describe hierarchies based on one facet of the data. However, when using this hierarchy to elucidate relationships between metadata, the distribution of metadata labels within the hierarchy may exhibit different levels of aggregation-uniform, random, or clumped. As of now, there exists no measure for finding the level of aggregation, or "clumpiness", between labels distributed among the leaves of a hierarchical container. We propose a clumpiness measure to aid in the quantification of relationships between metadata. We validated our measure with random trees and found that the measure is resistant to changes in the tree size, label size, and the number of types of labels, compared to the closest alternative measures. We used our clumpiness measure to quantify the relationships between light and heavy chains in human and mouse B cell and T cell receptor V genes based on their motifs. We found that the B cell heavy chains were the most aggregated while the T cell chains were the least aggregated and that the IGL chain was clumped the most with the T cell chains out of all of the B cell chains. (C) 2016 Elsevier B.V. All rights reserved.
Metrics
Details
- Title
- Using a novel clumpiness measure to unite data with metadata: Finding common sequence patterns in immune receptor germline V genes
- Creators
- Gregory W. Schwartz - Drexel UniversityAli Shokoufandeh - Drexel UniversitySantiago Ontanon - Drexel UniversityUri Hershberg - Drexel University
- Publication Details
- Pattern recognition letters, v 74, pp 24-29
- Publisher
- Elsevier
- Number of pages
- 6
- Grant note
- 84.200 / U.S. Department of Education Graduate Assistance in Areas of National Need (GAANN) program, CFDA P01AI106697 / National Institute Of Allergy And Infectious Diseases of the National Institutes of Health; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Institute of Allergy & Infectious Diseases (NIAID) 1551338 / National Science Foundation Information & Intelligent Systems
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Computer Science; School of Biomedical Engineering, Science, and Health Systems
- Web of Science ID
- WOS:000373190000004
- Scopus ID
- 2-s2.0-84964344026
- Other Identifier
- 991019168205704721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Artificial Intelligence