Skip to yearly menu bar Skip to main content


Poster
in
Workshop: DataWorld: Unifying data curation frameworks across domains

SNAC-DB: The Hitchhiker’s Guide to Building Better Predictive Models of Antibody & NANOBODY® VHH–Antigen Complexes

Abhinav Gupta · Bryan Munoz Rivero · Jorge Roel-Touris · Ruijiang Li · Norbert Furtmann · Yves Nanfack · Maria Wendt · Yu Qiu

Keywords: [ Protein complex prediction ] [ Antibody–antigen complexes ] [ Machine learning ] [ Benchmarking dataset ] [ Data curation ] [ Protein Data Bank ] [ Nanobody–antigen complexes ]


Abstract:

Predicting Antibody \& NANOBODY® VHH–antigen complexes remains a blind spot for state-of-the-art models, jeopardizing their potential real-world impact in drug discovery pipelines. We introduce SNAC-DB, an ML-ready database and pipeline—enriched with structural biologists’ expertise—that is designed to accelerate gains in accuracy and generalization by providing upto 28\% expanded structural diversity over existing collections like SAbDab, all without waiting for future experimental structures. SNAC-DB expands structural diversity by capturing often-overlooked complexes, and by accurately identifying complete multi-chain epitopes through improved logic and biological-assemblies. Built for and by ML scientists, SNAC-DB reduces common pitfalls by providing preprocessed data in ML-friendly formats. Multi-threshold structure-based clustering offers principled sample weighting, ensuring every structure can be leveraged during training. Using a new rigorous benchmark consisting of public PDB entries post–May 30, 2024, plus confidential therapeutic structures, we evaluate five leading models (AlphaFold2.3-multimer, Boltz-1x, Chai-1, DiffDock-PP, and GeoDock) and demonstrate systematic overestimation: success rates rarely exceed 20\%, inbuilt confidence metrics and ranking underperform, and all struggle with novel targets and conformations captured in our dataset.

Chat is not available.