Skip to yearly menu bar Skip to main content


Poster
in
Workshop: CODEML: Championing Open-source DEvelopment in Machine Learning

DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling

Ankita Bisoi · Shreyas Vinaya Sathyanarayana · Jose Siguenza · Bharath Ramsundar

[ ] [ Project Page ]
Fri 18 Jul 2:15 p.m. PDT — 3 p.m. PDT

Abstract:

Variant calling is a fundamental task in genomic research for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open source drug discovery framework, through the integration of DeepVariant. We introduce DeepChem-Variant, a variant calling pipeline that leverages DeepVariant's convolutional neural network (CNN) architecture to improve variant detection accuracy and reliability. DeepChem-Variant has stages for realignment of sequencing reads, candidate variant detection, and pileup image generation, followed by variant classification using either the original modified Inception V3 model or our novel MobileNetV2 implementation. We performed 3 case studies to validate our approach. Our work also contributes optimized utility functions for genomic data formats, including enhanced DataLoaders for BAM, SAM, and CRAM files, and an optimized FASTALoader. These implementations collectively provide a modular and extensible variant calling framework within DeepChem, enabling tighter integration of DeepChem's drug discovery infrastructure with bioinformatics pipelines for future research.

Chat is not available.