AI-Driven Drug Repurposing Using Multi-Modal Deep Learning and Graph Neural Networks
Manideep Pendyala, Ashwath Ramsundar
Objective
To develop an AI framework for drug repurposing — identifying new therapeutic uses for existing drugs — by integrating molecular, textual, and biomedical graph data into a unified multi-modal deep learning system. The project aims to improve predictive accuracy, model interpretability, and uncertainty calibration for drug–target interaction prediction.
Background
Drug discovery is slow, expensive, and often redundant. Drug repurposing leverages known compounds to find new disease indications, reducing both cost and development time. Traditional bioinformatics approaches struggle to combine diverse data types such as molecular structure, biomedical literature, and gene–disease associations.
Our approach employs MolBERT, BioBERT, and Graph Neural Networks (GNNs) to merge these heterogeneous modalities, while an Engression module models the full conditional uncertainty in predicted drug potency (pIC₅₀).
Methodology
Data Sources: ChEMBL bioactivity data, CTD chemical–disease–gene network, and PubMed abstracts.
Feature Extraction:
MolBERT — molecular embeddings from SMILES strings.
BioBERT — contextual biomedical text embeddings.
Relational GNN — multi-hop network embeddings from CTD graphs.
Fusion Strategy: Concatenation of 1,728-dimensional embeddings into a heteroscedastic neural network.
Uncertainty Modeling: The Engression module captures full predictive distributions, producing calibrated confidence intervals.
Evaluation Metrics: RMSE, ECE (Expected Calibration Error), AUC-ROC, AUPRC, and Precision@10.
Results
The multi-modal Engression model achieved the best overall performance — 24 % lower RMSE than GNN-only and a well-calibrated uncertainty (ECE = 4.3 %).
A case study on MESH:C000598644 confirmed that the model successfully prioritized top candidate drugs (pIC₅₀ > 7.0) with cross-modal agreement and robust confidence estimates.
Figures
Framework Overview: Multi-modal uncertainty-aware architecture combining MolBERT, BioBERT, and GNN embeddings.
Predicted Potency Distributions: Visualizing model uncertainty with 1 σ shading.
Exceedance Probabilities: Probability of exceeding pIC₅₀ = 7.0 for top drug candidates.
Impact and Future Work
The framework demonstrates that combining structured biomedical knowledge with molecular and textual representations yields state-of-the-art predictive performance and uncertainty calibration. Future work will expand to multi-task learning, integrate additional omics data, and test scalability for clinical-trial drug prioritization.
This research is ongoing, with continued model development and evaluation underway. A full manuscript is being prepared for submission to a leading venue such as NeurIPS.
Preprint: Pendyala, M., & Ramsundar, A. (2025). AI-Driven Drug Repurposing Using Multi-Modal Deep Learning and Graph Neural Networks. arXiv preprint (in submission).
Code Repository: GitHub – Drug-Repurposing-Multi-Modal-DL-GNN-Engression
Presentation slides: Slides-AI-Drug-Repurposing (from Scipy India 2025)
This repository contains the complete end-to-end pipeline, including data preprocessing, SMILES standardization, graph construction, embedding extraction (MolBERT + BioBERT + GNN), Engression-based uncertainty modeling, and evaluation scripts. It also includes figures, results, and presentation slides from the SciPy India 2025 talk.