Research

Drug Discovery

AI Drug Discovery and Molecular Design

This program targets the most expensive uncertainty in drug discovery: which molecules and mechanisms deserve experimental attention.

01Molecules and targets
02Representation learning
03Property and interaction prediction
04Prioritized experimental candidates

Recent papers

What this program is building

Selected recent and foundational papers, summarized around the task, why it matters, and the main technical result.

2026AAAI

Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation

1Biological KB
2Graph learning
3Explained pathway
Task
Infer biological pathways from knowledge bases using graph learning and explanation.
Why it matters
Pathway inference helps connect molecular observations to mechanisms that can guide therapeutic hypotheses.
Main result
The work emphasizes explainable graph reasoning so inferred pathways can be inspected by scientists.
Paper details
2025AAAI

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

1Molecule graph
2Biomedical knowledge
3Molecule embedding
Task
Learn molecule representations that combine molecular structure with external knowledge.
Why it matters
Better representations improve downstream property prediction and reduce wasted experimental search.
Main result
Bi-level contrastive learning aligns molecular and knowledge views into stronger predictive embeddings.
Paper details
2025ICLR

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

1Omics profiles
2Gene interaction inference
3Subtype network
Task
Infer gene interactions and disease subtype networks from biomedical data.
Why it matters
Disease subtyping can reveal mechanisms and patient groups that matter for precision therapeutics.
Main result
The model generates subtype-specific interaction networks that make disease heterogeneity more actionable.
Paper details
2025Scientific Data

MLOmics: Cancer Multi-Omics Database for Machine Learning

1Cancer omics
2ML-ready database
3Benchmark tasks
Task
Build a machine-learning-ready cancer multi-omics resource.
Why it matters
Shared, well-structured datasets make therapeutic and biomarker modeling more reproducible.
Main result
MLOmics packages multi-omics data into a resource designed for ML benchmarking and discovery.
Paper details
2025JDMLR

MolTextQA: A Question-Answering Dataset and Benchmark for Molecular Structure-Text Understanding

1Molecule + text
2Multimodal QA model
3Scientific answer
Task
Evaluate models that reason over molecule structures and scientific text.
Why it matters
Drug discovery needs models that understand both chemical graphs and the language scientists use to describe them.
Main result
MolTextQA creates a benchmark for testing multimodal molecule-language understanding.
Paper details
2023NeurIPS

CoDrug: Conformal Drug Property Prediction with Density Estimation under Covariate Shift

1Shifted molecules
2Conformal prediction
3Calibrated property range
Task
Quantify uncertainty for drug property prediction when test molecules differ from training data.
Why it matters
Distribution shift is routine in discovery, so calibrated uncertainty is essential for deciding what to test.
Main result
CoDrug uses conformal prediction and density estimation to make predictions more reliable under shift.
Paper details

Representative publication links

Interested in this program?

Send a concise note with the program name, your role, the problem you want to work on, and any relevant data, code, clinical setting, or research experience.

Contact Sunlab
Clinical AI Medical LLMs Synthetic Data