Drug Discovery

AI Drug Discovery and Molecular Design

This program targets the most expensive uncertainty in drug discovery: which molecules and mechanisms deserve experimental attention.

01Molecules and targets

02Representation learning

03Property and interaction prediction

04Prioritized experimental candidates

What this program is building

Selected recent and foundational papers, summarized around the task, why it matters, and the main technical result.

2026AAAI

Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation

1Biological KB

2Graph learning

3Explained pathway

Task: Infer biological pathways from knowledge bases using graph learning and explanation.
Why it matters: Pathway inference helps connect molecular observations to mechanisms that can guide therapeutic hypotheses.
Main result: The work emphasizes explainable graph reasoning so inferred pathways can be inspected by scientists.

Paper details

2025AAAI

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

1Molecule graph

2Biomedical knowledge

3Molecule embedding

Task: Learn molecule representations that combine molecular structure with external knowledge.
Why it matters: Better representations improve downstream property prediction and reduce wasted experimental search.
Main result: Bi-level contrastive learning aligns molecular and knowledge views into stronger predictive embeddings.

Paper details

2025ICLR

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

1Omics profiles

2Gene interaction inference

3Subtype network

Task: Infer gene interactions and disease subtype networks from biomedical data.
Why it matters: Disease subtyping can reveal mechanisms and patient groups that matter for precision therapeutics.
Main result: The model generates subtype-specific interaction networks that make disease heterogeneity more actionable.

Paper details

2025Scientific Data

MLOmics: Cancer Multi-Omics Database for Machine Learning

1Cancer omics

2ML-ready database

3Benchmark tasks

Task: Build a machine-learning-ready cancer multi-omics resource.
Why it matters: Shared, well-structured datasets make therapeutic and biomarker modeling more reproducible.
Main result: MLOmics packages multi-omics data into a resource designed for ML benchmarking and discovery.

Paper details

2025JDMLR

MolTextQA: A Question-Answering Dataset and Benchmark for Molecular Structure-Text Understanding

1Molecule + text

2Multimodal QA model

3Scientific answer

Task: Evaluate models that reason over molecule structures and scientific text.
Why it matters: Drug discovery needs models that understand both chemical graphs and the language scientists use to describe them.
Main result: MolTextQA creates a benchmark for testing multimodal molecule-language understanding.

Paper details

2023NeurIPS

CoDrug: Conformal Drug Property Prediction with Density Estimation under Covariate Shift

1Shifted molecules

2Conformal prediction

3Calibrated property range

Task: Quantify uncertainty for drug property prediction when test molecules differ from training data.
Why it matters: Distribution shift is routine in discovery, so calibrated uncertainty is essential for deciding what to test.
Main result: CoDrug uses conformal prediction and density estimation to make predictions more reliable under shift.

Paper details

Representative publication links

Nature Chemical Biology · 2022

Artificial Intelligence Foundation for Therapeutic Science

A foundation view for AI across therapeutic science.

Paper details NeurIPS · 2023

CoDrug: Conformal Drug Property Prediction with Density Estimation under Covariate Shift

Uncertainty estimates for molecules under realistic distribution drift.

Paper details Bioinformatics · 2020

DeepPurpose: A Deep Learning Library for Drug-Target Interaction Prediction

Reusable toolkit for DTI modeling.

Paper details

Interested in this program?

Send a concise note with the program name, your role, the problem you want to work on, and any relevant data, code, clinical setting, or research experience.

Contact Sunlab

Clinical AI Medical LLMs Synthetic Data