A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

Pichai Raman; Samuel Zimmerman; Komal S. Rathi; Laurence de Torrenté; Mahdi Sarmady; Chao Wu; Jeremy Leipzig; Deanne M. Taylor; Aydin Tozeren; Jessica C. Mar

doi:10.1016/j.cancergen.2019.04.004

Back

A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

Journal article

Peer reviewed

A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

Pichai Raman, Samuel Zimmerman, Komal S. Rathi, Laurence de Torrenté, Mahdi Sarmady, Chao Wu, Jeremy Leipzig, Deanne M. Taylor, Aydin Tozeren and Jessica C. Mar

Cancer genetics, v 235-236

Jun 2019

DOI: https://doi.org/10.1016/j.cancergen.2019.04.004

PMID: 31296308

Featured in Collection : UN Sustainable Development Goals @ Drexel

Additional Links

Abstract

Cancer

Gene expression

Kaplan–Meier

Survival analysis

TCGA

•Cox regression was the strongest method for predicting patient survival.•Other methods like C-index, D-index, and k-means also performed well.•Methods based on dichotomization had the worst overall performance. Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th–75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.

Metrics

10 Record Views

15 citations in Web of Science

14 citations in Scopus

See more details

Details

Title: A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data
Creators: Pichai Raman - Children's Hospital of Philadelphia
Samuel Zimmerman - Albert Einstein College of Medicine
Komal S. Rathi - Children's Hospital of Philadelphia
Laurence de Torrenté - Albert Einstein College of Medicine
Mahdi Sarmady - University of Pennsylvania
Chao Wu - Children's Hospital of Philadelphia
Jeremy Leipzig - Drexel University
Deanne M. Taylor - Children's Hospital of Philadelphia
Aydin Tozeren - Drexel University
Jessica C. Mar - University of Queensland
Publication Details: Cancer genetics, v 235-236
Publisher: Elsevier
Resource Type: Journal article
Language: English
Academic Unit: Computer Science; [Retired Faculty]
Web of Science ID: WOS:000518194500001
Scopus ID: 2-s2.0-85065769367
Other Identifier: 991019169662704721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration; International collaboration
Web of Science research areas: Genetics & Heredity; Oncology

A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

Additional Links

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media