Logo image
Deep learning multi-omics integration identifies new molecular subtypes of lung cancer
Journal article   Open access   Peer reviewed

Deep learning multi-omics integration identifies new molecular subtypes of lung cancer

Bianca Gonda, Derek Wang, Sayed Mehedi Azim, Gabriele Romano and Iman Dehzangi
BioData mining, Forthcoming
21 May 2026
PMID: 42169010
url
https://doi.org/10.1186/s13040-026-00563-zView
Published, Version of Record (VoR) Open CC BY V4.0

Abstract

Deep learning Somatic mutation Clinical annotation Molecular subtyping Computational oncology Integrative clustering Lung Cancer Gene Expression
Cancer is a heterogeneous disease, with numerous subtypes differing in molecular profiles, risk factors, clinical outcomes, and tumor locations. Lung cancer, the third most diagnosed cancer in the United States, is driven by a complex combination of molecular alterations, which influence tumor behavior and patient outcomes. Large-scale consortia such as The Cancer Genome Atlas (TCGA) have generated comprehensive data sets encompassing genomic, transcriptomic, proteomic, and clinical data that present a unique opportunity to uncover novel subtypes through multi-omics integration. While clustering methods have been used to identify subtypes in lung cancer, many integrations and avenues remain underexplored. Previous studies mostly focused on a single data type for this task, potentially limiting biological insight. To address this issue, we propose a novel analytical pipeline that combines gene expression and somatic mutation data in lung cancer to identify distinct molecular subtypes. We also use an autoencoder deep neural network, as well as model-based clustering for feature extraction and clustering purposes. Integrating gene expression data, representing dynamic tumor processes with somatic mutation data, reflecting invariant genomic characteristics, enables a more comprehensive understanding of the molecular profile. We identify molecularly defined subtypes, enhancing our understanding of lung cancer's molecular heterogeneity. All source code, figures, latent features, cluster labels, and instructions for data download from the GDC are available at https://github.com/MLBC-lab/LungClust. We believe this serves as a reliable tool that offers valuable biological and clinical insights to advance precision oncology approaches in lung cancer, specifically biomarker discovery, which will aid in patient stratification and treatment response prediction.

Metrics

1 Record Views

Details

Logo image