CODEML: Championing Open-source DEvelopment in Machine Learning

Workshop

CODEML: Championing Open-source DEvelopment in Machine Learning

Geoff Pleiss · Jonathan Wenger · Jūlija Pečerska · Alina Selega · Frank Schneider

West Meeting Room 211-214

Fri 18 Jul, 9 a.m. PDT

[ Abstract ] Workshop Website

[ OpenReview]

Open-source software (OSS) development is a cornerstone of modern machine learning research. However, issues such as the sustainability of long-term projects, software reliability, and proper academic acknowledgment of maintenance and contributions are often overlooked. This workshop aims to identify and discuss strategies for successful and sustainable open-source development in ML while also proposing solutions to these challenges. Additionally, the workshop will provide a platform to recognize the efforts of open-source contributors in the field. We will bring together machine learning researchers, engineers, industrial practitioners, and software development experts. The workshop will feature invited talks, panel discussions with experts, and workshop paper submissions from open-source contributors in machine learning.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 9:00 a.m. - 9:15 a.m.	👋 Opening Remarks	Alina Selega · Frank Schneider · Geoff Pleiss · Jonathan Wenger · Jūlija Pečerska 🔗
Fri 9:15 a.m. - 9:45 a.m.	Open-Source Attention Optimizations ( Invited Talk: Tri Dao ) > link Link	Tri Dao 🔗
Fri 9:45 a.m. - 10:15 a.m.	☕ Coffee break	🔗
Fri 10:15 a.m. - 10:45 a.m.	DIY Deep Learning a Decade Later: a Retrospective on Brewing Community with Caffe ( Invited Talk: Evan Shelhamer ) > link Link	Evan Shelhamer 🔗
Fri 10:45 a.m. - 11:15 a.m.	Beyond the Research Paper: Why the Way Breakthroughs Happen is Ripe for Revolution ( Invited Talk: Sara Hooker ) > link Link	Sara Hooker 🔗
Fri 11:15 a.m. - 11:30 a.m.	KernelBot: A Competition Platform for Writing Heterogeneous GPU Code ( Contributed Talk ) >	Matej Sirovatka 🔗
Fri 11:30 a.m. - 11:45 a.m.	Meta-World+: An Improved, Standardized, RL Benchmark ( Contributed Talk ) > link Link	Reginald McLean 🔗
Fri 11:45 a.m. - 12:00 p.m.	Reproducible sampling from intractable distributions with Pigeons.jl ( Contributed Talk ) >	Miguel Biron-Lattes 🔗
Fri 12:00 p.m. - 1:00 p.m.	🍽️ Lunch	🔗
Fri 1:00 p.m. - 1:30 p.m.	JAX and OSS at Google / DeepMind ( Invited Talk: Matthew Johnson ) >	Matthew Johnson 🔗
Fri 1:30 p.m. - 2:00 p.m.	Differentiating and Integrating Open Source Development with Research in Julia's SciML ( Invited Talk: Chris Rackauckas ) > link Link	Christopher Rackauckas 🔗
Fri 2:00 p.m. - 2:15 p.m.	If Open Source is to Win, It Must Go Public ( Contributed Talk ) >	Joshua Tan 🔗
Fri 2:15 p.m. - 3:00 p.m.	Poster session	🔗
	→ Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights ( Poster ) > link Link	Alexandra Abbas · Celia Waggoner · Justin Olive 🔗
	→ If open source is to win, it must go public ( Poster ) > link Link	Joshua Tan · Nicholas Vincent · Katherine Elkins · Magnus Sahlgren 🔗
	→ ZKLoRA: Efficient Zero-Knowledge Proofs for LoRA Verification ( Poster ) > link Link	Bidhan Roy · Peter Potash · Marcos Villagra 🔗
	→ Library X: User-Friendly Differential Privacy Library in PyTorch ( Poster ) > link Link	Sai Aparna Aketi · Will Bullock · Iden Kalemaj · Enayat Ullah · Huanyu Zhang 🔗
	→ Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks ( Poster ) > link Link	Isaac Chung · Imene Kerboua · Márton Kardos · Roman Solomatin · Kenneth Enevoldsen 🔗
	→ LIB_NAME: API-first feature extraction for image-based profiling workflows ( Poster ) > link Link	Ala Muñoz · Tim Treis · Alexandr A. Kalinin · Shatavisha Dasgupta · Fabian Theis · Anne Carpenter · Shantanu Singh 🔗
	→ Write Code that People Want to Use ( Poster ) > link Link	Stella Biderman · Jennifer Mickel · Baber Abbasi 🔗
	→ Deploying User-Friendly Software: Six Recommendations to Make Single-Cell Foundation Models More Usable For Scientific Discovery ( Poster ) > link Link	Izumi Ando · Hassaan Maan · Kieran Campbell 🔗
	→ An LLM-Powered Tool for Enhancing Scientific Open-Source Repositories ( Poster ) > link Link	Nikolay Nikitin · Andrey Getmanov · Zakhar Popov · Ulyanova Alekseevna · Yaroslav Aksenkin · Ilya Sokolov · Alexander Boukhanovsky 🔗
	→ Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms ( Poster ) > link Link	14 presenters Philippe M. Wyder · Judah A Goldfeder · Alexey Yermakov · Yue Zhao · Stefano Riva · Jan Williams · David Zoro · Amy Rude · Matteo Tomasetto · Joe Germany · Joseph Bakarji · Georg Maierhofer · Miles Cranmer · J. Nathan Kutz 🔗
	→ N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion ( Poster ) > link Link	Caleb Chin · Aashish Khubchandani · Harshvardhan Maskara · Kyuseong Choi · Jacob Feitelberg · Albert Gong · Manit Paul · Tathagata Sadhukhan · Anish Agarwal · Raaz Dwivedi 🔗
	→ Swizz: One-Liner Figures, LaTeX Tables, and Flexible Layouts for Scientific Papers ( Poster ) > link Link	Lars Quaedvlieg · Andrea Miele · Caglar Gulcehre 🔗
	→ DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling ( Poster ) > link Link	Ankita Bisoi · Shreyas Vinaya Sathyanarayana · Jose Siguenza · Bharath Ramsundar 🔗
	→ Open-Source Foosball Benchmark for Deep Reinforcement Learning ( Poster ) > link Link	Matthew So · Kwansoo Lee · Judah A Goldfeder · Hod Lipson 🔗
	→ RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing ( Poster ) > link Link	Yiqing Xie · Alex Xie · Divyanshu Sheth · Pengfei Liu · Daniel Fried · Carolyn Rose 🔗
	→ An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models ( Poster ) > link Link	Pranav Guruprasad · Yangyue Wang · Harsh Sikka 🔗
	→ Meta-World+: An Improved, Standardized, RL Benchmark ( Poster ) > link Link	12 presenters Reginald McLean · Evangelos Chatzaroulas · Luc McCutcheon · Frank Röder · Tianhe (Kevin) Yu · Zhanpeng He · K.R. Zentner · Ryan Julian · Jordan Terry · Isaac Woungang · Nariman Farsad · Pablo Samuel Castro 🔗
	→ skglm: Improving scikit-learn for regularized Generalized Linear Models ( Poster ) > link Link	Mathurin Massias · Badr MOUFAD · Quentin Bertrand 🔗
	→ KernelBot: A Competition Platform for Writing Heterogeneous GPU Code ( Poster ) > link Link	Alex Zhang · Matej Sirovatka · Erik Schultheis · Benjamin Horowitz · Mark Saroufim 🔗
	→ Orthogonium: A Unified, Efficient Library of Orthogonal and 1‑Lipschitz Building Blocks ( Poster ) > link Link	Thibaut Boissin · Franck Mamalet · Valentin Lafargue · Mathieu Serrurier 🔗
	→ Gundam: A PyTorch Native Platform for Training Generative AI Models ( Poster ) > link Link	Tianyu Liu · Wanchao Liang 🔗
	→ Provenance Design and Evolution in a Production ML Library ( Poster ) > link Link	Adam C Pocock · Joseph Wonsil · Romina Mahinpei · Jack Sullivan · Margo Seltzer 🔗
	→ ABC Gym: a simulation environment for low-bandwidth training ( Poster ) > link Link	Seth Howes · Matt Beton · Mohamed Baioumy · Alex Cheema 🔗
	→ AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models ( Poster ) > link Link	Jacob Chmura · Shahrad Mohammadzadeh · Ivan Anokhin · Jacob-Junqi Tian · Mandana Samiei · Taz Scott-Talib · Irina Rish · Doina Precup · Reihaneh Rabbany · Nishanth V Anand 🔗
	→ LUQ: Language Models Uncertainty Quantification Toolkit ( Poster ) > link Link	Alexander V Nikitin · Martin Trapp · Pekka Marttinen 🔗
	→ DISCO: A Browser-Based Privacy-Preserving Framework for Distributed Collaborative Learning ( Poster ) > link Link	Julien Vignoud · Martin Jaggi · Mary-Anne Hartley · Tahseen Rabbani · Valérian Rousset 🔗
	→ Reproducible sampling from intractable distributions with Pigeons.jl ( Poster ) > link Link	Miguel Biron-Lattes · Nikola Surjanovic · Paul Tiede · Saifuddin Syed · Trevor Campbell · Alexandre Bouchard-Côté 🔗
	→ FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems ( Poster ) > link Link	Val Andrei Fajardo · D. Emerson · Amandeep Singh · Marcelo Lotif · Veronica Chatrath · Izuki Matsuba · Chi Cheung · Ravi Theja Desetty 🔗
	→ DINOHash: Learning Adversarially Robust Perceptual Hashes from Self-Supervised Features ( Poster ) > link Link	Shree Singhi · Aayush Gupta · Lukas Struppek 🔗
	→ TorchAO: PyTorch-Native Training-to-Serving Model Optimization ( Poster ) > link Link	11 presenters Andrew Or · Apurva Jain · Daniel Vega-Myhre · Jesse Cai · Charles Hernandez · Zhenrui Zhang · Driss Guessous · Vasiliy Kuznetsov · Christian Puhrsch · Mark Saroufim · Supriya Rao 🔗
	→ $\texttt{markovml}$: A Python Package for Verifying Markov Processes with Embedded Machine Learning Models ( Poster ) > link Link	Muhammad Maaz · Timothy Chan 🔗
	→ Control Flow Operators in PyTorch ( Poster ) > link Link	Yidi Wu · Thomas Ortner · Edward Yang · Horace He · Yanan Cao 🔗
	→ A2Perf: Benchmarking Autonomous Agents End-to-End in Realistic Domains ( Poster ) > link Link	18 presenters Ikechukwu Uchendu · Jason Jabbour · Korneel Van den Berghe · Joel Runevic · Matthew Stewart · Jeffrey Ma · Srivatsan Krishnan · Izzeddin Gur · Austin Huang · Colton Bishop · Paige Bailey · Wenjie Jiang · Ebrahim M. Songhori · Sergio Guadarrama · Jie Tan · Jordan Terry · Aleksandra Faust · Vijay Janapa Reddi 🔗
	→ laplax - Laplace Approximations with JAX ( Poster ) > link Link	Tobias Weber · Bálint Mucsányi · Lenard Rommel · Thomas Christie · Lars Kasüschke · Marvin Pförtner · Philipp Hennig 🔗
	→ PyLO: Towards Accessible Learned Optimizers in Pytorch ( Poster ) > link Link	Paul Janson · Benjamin Thérien · Quentin Anthony · Xiaolong Huang · Abhinav Moudgil · Eugene Belilovsky 🔗
	→ SAGDA: Open-Source Synthetic Agriculture Data for Africa ( Poster ) > link Link	Abdelghani Belgaid · Oumnia Ennaji 🔗
	→ BoFire: Bayesian Optimization Framework Intended for Real Experiments ( Poster ) > link Link	18 presenters Johannes P. Dürholt · Thomas Asche · Johanna Kleinekorte · Gabriel Mancino-Ball · Benjamin Schiller · Simon Sung · Julian Keupp · Aaron P. Osburg · Toby Boyne · Ruth Misener · Rosona Eldred · Chrysoula Kappatou · Robert Lee · Dominik Linzner · Wagner Steuer Costa · David Walz · Niklas Wulkow · Behrang Shafei 🔗
	→ Spatial Reasoning over Continuous Variables with PySpaRe ( Poster ) > link Link	Bartlomiej Pogodzinski · Christopher Wewer · Bernt Schiele · Jan Eric Lenssen 🔗
	→ OpenDG: A Modular Framework for Machine Learning on Dynamic Graphs ( Poster ) > link Link	Jacob Chmura · Shenyang (Andy) Huang · Ali Parviz · Farimah Poursafaei · Michael Bronstein · Guillaume Rabusseau · Matthias Fey · Reihaneh Rabbany 🔗
	→ M(M)ORE : Massive Multimodal Open RAG & Extraction ( Poster ) > link Link	Alexandre Sallinen · Stefan Krsteski · Paul Teiletche · Allard Marc-Antoine · Baptiste Lecoeur · Michael Zhang · Fabrice Nemo · David Kalajdzic · Matthias Meyer · Mary-Anne Hartley 🔗
	→ Liger-Kernel: Efficient Triton Kernels for LLM Training ( Poster ) > link Link	11 presenters Pin-Lun Hsu · Yun Dai · Vignesh Kothapalli · Qingquan Song · Shao Tang · Siyu Zhu · Steven Shimizu · Shivam Sahni · Haowen Ning · Yanning Chen · Zhipeng Wang 🔗
	→ Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization ( Poster ) > link Link	Leonard Papenmeier · Luigi Nardi 🔗
	→ olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models ( Poster ) > link Link	Jake Poznanski · Aman Rangapur · Jon Borchardt · Jason Dunkelberger · Christopher Wilhelm · Kyle Lo · Luca Soldaini 🔗
	→ Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions ( Poster ) > link Link	Jhih-Yi Hsieh · Aditi Raghunathan · Nihar Shah 🔗
Fri 3:00 p.m. - 3:30 p.m.	☕ Coffee break	🔗
Fri 3:30 p.m. - 4:00 p.m.	Lessons from the Trenches on Reproducible Evaluation of Language Models ( Invited Talk: Stella Biderman ) >	Stella Biderman 🔗
Fri 4:00 p.m. - 4:55 p.m.	Panel discussion	Evan Shelhamer · Matthew Johnson · Stella Biderman · Tri Dao 🔗
Fri 4:55 p.m. - 5:00 p.m.	👋 Closing remarks	Alina Selega · Frank Schneider · Geoff Pleiss · Jonathan Wenger · Jūlija Pečerska 🔗