Journal article
Deep learning multi-omics integration identifies new molecular subtypes of lung cancer
BioData mining, Forthcoming
21 May 2026
PMID: 42169010
Abstract
Cancer is a heterogeneous disease, with numerous subtypes differing in molecular profiles, risk factors, clinical outcomes, and tumor locations. Lung cancer, the third most diagnosed cancer in the United States, is driven by a complex combination of molecular alterations, which influence tumor behavior and patient outcomes. Large-scale consortia such as The Cancer Genome Atlas (TCGA) have generated comprehensive data sets encompassing genomic, transcriptomic, proteomic, and clinical data that present a unique opportunity to uncover novel subtypes through multi-omics integration. While clustering methods have been used to identify subtypes in lung cancer, many integrations and avenues remain underexplored. Previous studies mostly focused on a single data type for this task, potentially limiting biological insight. To address this issue, we propose a novel analytical pipeline that combines gene expression and somatic mutation data in lung cancer to identify distinct molecular subtypes. We also use an autoencoder deep neural network, as well as model-based clustering for feature extraction and clustering purposes. Integrating gene expression data, representing dynamic tumor processes with somatic mutation data, reflecting invariant genomic characteristics, enables a more comprehensive understanding of the molecular profile. We identify molecularly defined subtypes, enhancing our understanding of lung cancer's molecular heterogeneity. All source code, figures, latent features, cluster labels, and instructions for data download from the GDC are available at https://github.com/MLBC-lab/LungClust. We believe this serves as a reliable tool that offers valuable biological and clinical insights to advance precision oncology approaches in lung cancer, specifically biomarker discovery, which will aid in patient stratification and treatment response prediction.
Metrics
1 Record Views
Details
- Title
- Deep learning multi-omics integration identifies new molecular subtypes of lung cancer
- Creators
- Bianca Gonda - Rutgers, The State University of New JerseyDerek Wang - Rutgers, The State University of New JerseySayed Mehedi Azim - Rutgers, The State University of New JerseyGabriele Romano - Drexel UniversityIman Dehzangi - Rutgers, The State University of New Jersey
- Publication Details
- BioData mining, Forthcoming
- Publisher
- Springer Nature
- Grant note
- 2152059 / NSF-NRT
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Pharmacology and Physiology
- Other Identifier
- 991022180789204721