Logo image
Translating the Language of Life
Conference proceeding   Open access

Translating the Language of Life

Chaz Allegra, Robi Polikar and Gail Rosen
Companion Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp 1-1
11 Oct 2025
url
https://doi.org/10.1145/3768322.3769101View
Published, Version of Record (VoR) Open

Abstract

Computing methodologies -- Artificial intelligence -- Natural language processing Computing methodologies -- Artificial intelligence -- Natural language processing -- Machine translation Computing methodologies -- Artificial intelligence -- Natural language processing -- Natural language generation Computing methodologies -- Machine learning -- Machine learning algorithms Computing methodologies -- Machine learning -- Machine learning approaches -- Neural networks
Proteins are fundamental to life, yet uncovering their structure and function has long required intensive experiments. Recent work, such as ESM [?], shows that language models can learn molecular biology as a language. We present Prometheus, a protein language model (PLM) that translates raw sequences into natural language by combining pretrained encoders (ESM-2, ESM-Cambrian) with a large language model backbone. To enable training, we introduce ProtQA, a large-scale question answering dataset from UniProt spanning functional, structural, and biomedical aspects. Prometheus demonstrates the promise of multimodal LLMs to make protein knowledge accessible and interactive, bridging biological sequences and human understanding.

Metrics

5 Record Views

Details

Logo image