ICML Poster KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Poster

KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Benson Chen · Tomasz Danel · Gabriel Dreiman · Patrick McEnaney · Nikhil Jain · Kirill Novikov · Spurti Akki · Joshua L. Turnbull · Virja Pandya · Boris Belotserkovskii · Jared Weaver · Ankita Biswas · Dat Nguyen · Kent Gorday · Mohammad M Sultan · Nathaniel Stanley · Daniel Whalen · Divya Kanichar · Christoph Klein · Emily Fox · R. Watts

West Exhibition Hall B2-B3 #W-112

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.

Lay Summary:

DNA-Encoded Libraries (DELs) are extensive collections of chemical compounds, each tagged with a unique DNA barcode. These libraries allow scientists to quickly test millions of compounds to see if they bind to specific targets involved in diseases. Currently, a significant challenge in the field is the scarcity of available DEL datasets. Without these vital resources, researchers face challenges in developing and comparing machine learning techniques effectively, which slows down progress in identifying potential new treatments.To tackle this issue, we introduce KinDEL, a robust dataset containing 81 million compounds, specifically designed to propel the development of machine learning models for DEL research. KinDEL is a vast library that includes compounds tested against two kinase targets and offers a new benchmark with biophysical data for selected compounds, both with and without DNA tags.The release of the KinDEL dataset equips the scientific community with the necessary tools to develop advanced machine learning models for DEL analysis, ultimately accelerating the discovery of new drug candidates. This initiative represents an important step forward in making DEL datasets more accessible for drug discovery research.

Chat is not available.