Logo image
Property-Driven Cocrystal Discovery via Gaussian Processes and Active Learning
Journal article   Open access   Peer reviewed

Property-Driven Cocrystal Discovery via Gaussian Processes and Active Learning

Samuel A. Appiah and Matthew A. McDonald
Crystal growth & design, v 26(7), pp 2688-2702
16 Mar 2026
url
https://doi.org/10.1021/acs.cgd.5c01417View
Published, Version of Record (VoR)Open Access via Drexel Libraries Read and Publish Program 2026CC BY V4.0 Open

Abstract

Computer Simulation or Modeling Optimization
Cocrystallization can be used to tune key drug properties, such as aqueous solubility, without altering molecular structure; however, the space of possible coformers is enormous and design rules are empirical. We present a Bayesian optimization framework that couples Gaussian process (GP) classification and regression that can accelerate cocrystal discovery and solubility enhancement. Starting from 6338 literature-derived binary coformer pairs, vector fingerprints that combine 2D structural information (fragment and MQN fingerprints) with low-cost shape and polarity descriptors were engineered for cocrystal prediction. A GP classifier, trained on an actively constructed training set of ∼1000 coformer pairs selected by uncertainty sampling achieves up to 94% accuracy and Matthews correlation coefficient of 0.79 on a test set of >5000 unseen pairs. Property-driven coformer selection was formulated as a Bayesian optimization problem, using a machine learning model as a surrogate for aqueous solubility and Tanimoto-similarity to guide campaigns across several discovery scenarios. In simulations, the framework rapidly identifies highly soluble cocrystals, typically recovering top-5 candidates after fewer than 10 evaluations. Finally, we validate the workflow experimentally with 12 pharmaceutical and pharmaceutical-like compounds, discovering two new cocrystals, resveratrol + praziquantel and purin-6-amine + thiazole-4-carboxylic acid, with markedly enhanced aqueous solubility. These results demonstrate a practical, data-efficient route to Bayesian optimization for cocrystal design.

Metrics

1 Record Views

Details

Logo image