Control of small magnetic object in artificial human tissue material

Bosheng Wu

doi:10.17918/D8HT1Q

The increasing availability of microbiome survey data has led to the use of complex machine learning and statistical approaches to measure taxonomic diversity and extract relationships between taxa and their host or environment. Accurately representing microbiome community structure has notable implications in medicine because recent work has demonstrated bidirectional interplay between microbiota and various organ systems. However, many approaches inadequately account for difficulties inherent to microbiome data, such as (1) insufficient sequencing depth resulting in sparse count data, (2) a large feature space relative to sample space, resulting in data prone to overfitting, and (3) library size imbalance, requiring normalization strategies that lead to compositional artifacts. Still, there exist approaches from other domains (e.g., natural language processing) that may be well-equipped at fitting microbiome data and may provide meaningful features that capture relevant aspects of the data. Two methods in particular are topic models and word embeddings, which characterize word co-occurrence as topics and capture semantic and lexical information of each word based on the word's neighbors, respectively. In this work, we show that a topic model can represent microbiome abundance data as topics, capturing "subcommunity'' structure from co-occurrence patterns among taxa, whereas word embeddings can represent a nucleotide subsequence as a dense, numeric vector that encapsulates the nucleotide neighborhood in which the subsequence exists. Specifically, we present two approaches, both of which are applied to 16S rRNA amplicon surveys. First, we utilize a topic model approach. We show that library-size normalization is unnecessary and, by exploiting topic-to-topic correlations, the topic model can successfully capture complex signals such as dynamic time-series behavior of taxonomic subcommunities. In addition, we present themetagenomics to demonstrate that topic features are flexible for downstream analysis. We link taxonomic co-occurrence to their predicted functional content by leveraging gene function prediction algorithms and a fully Bayesian multilevel regression model. Second, we use Skip-Gram word2vec and a recent sentence embedding approach to embed nucleotide sequences. Our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. The sequence embeddings can preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. The insights we provide are applicable to various types of count data that extend beyond the microbiome sequencing domain. These include ecological presence/absence surveys, RNAseq gene expression studies, metagenomic or whole genome sequencing studies, proteomic or metabolic research, text-based studies, and econometrics. In addition, our approaches for exploring the sequence embedding space are applicable to any type of text-base research, including genetics and natural language processing, as well applications utilizing deep learning, where embedding layers are used to encode text for deeper layers of the network. Lastly, our simulation approaches and evaluation of normalization techniques are generalizable, such that aspects of these strategies could be applied to microbiome studies and work consisting of compositional data other than 16S rRNA amplicon surveys.

Control of small magnetic object in artificial human tissue material

Files and links (1)

Abstract

Metrics

Details

Control of small magnetic object in artificial human tissue material

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media