Workshop
CODEML: Championing Open-source DEvelopment in Machine Learning
Geoff Pleiss · Jonathan Wenger · Jūlija Pečerska · Alina Selega · Frank Schneider
West Meeting Room 211-214
Fri 18 Jul, 9 a.m. PDT
Open-source software (OSS) development is a cornerstone of modern machine learning research. However, issues such as the sustainability of long-term projects, software reliability, and proper academic acknowledgment of maintenance and contributions are often overlooked. This workshop aims to identify and discuss strategies for successful and sustainable open-source development in ML while also proposing solutions to these challenges. Additionally, the workshop will provide a platform to recognize the efforts of open-source contributors in the field. We will bring together machine learning researchers, engineers, industrial practitioners, and software development experts. The workshop will feature invited talks, panel discussions with experts, and workshop paper submissions from open-source contributors in machine learning.
Schedule
Fri 9:00 a.m. - 9:15 a.m.
|
👋 Opening Remarks
|
Alina Selega · Frank Schneider · Geoff Pleiss · Jonathan Wenger · Jūlija Pečerska 🔗 |
Fri 9:15 a.m. - 9:45 a.m.
|
Open-Source Attention Optimizations ( Invited Talk: Tri Dao ) > link | Tri Dao 🔗 |
Fri 9:45 a.m. - 10:15 a.m.
|
☕ Coffee break
|
🔗 |
Fri 10:15 a.m. - 10:45 a.m.
|
DIY Deep Learning a Decade Later: a Retrospective on Brewing Community with Caffe ( Invited Talk: Evan Shelhamer ) > link | Evan Shelhamer 🔗 |
Fri 10:45 a.m. - 11:15 a.m.
|
Beyond the Research Paper: Why the Way Breakthroughs Happen is Ripe for Revolution ( Invited Talk: Sara Hooker ) > link | Sara Hooker 🔗 |
Fri 11:15 a.m. - 11:30 a.m.
|
KernelBot: A Competition Platform for Writing Heterogeneous GPU Code
(
Contributed Talk
)
>
|
Matej Sirovatka 🔗 |
Fri 11:30 a.m. - 11:45 a.m.
|
Meta-World+: An Improved, Standardized, RL Benchmark ( Contributed Talk ) > link | Reginald McLean 🔗 |
Fri 11:45 a.m. - 12:00 p.m.
|
Reproducible sampling from intractable distributions with Pigeons.jl
(
Contributed Talk
)
>
|
Miguel Biron-Lattes 🔗 |
Fri 12:00 p.m. - 1:00 p.m.
|
🍽️ Lunch
|
🔗 |
Fri 1:00 p.m. - 1:30 p.m.
|
JAX and OSS at Google / DeepMind
(
Invited Talk: Matthew Johnson
)
>
|
Matthew Johnson 🔗 |
Fri 1:30 p.m. - 2:00 p.m.
|
Differentiating and Integrating Open Source Development with Research in Julia's SciML ( Invited Talk: Chris Rackauckas ) > link | Christopher Rackauckas 🔗 |
Fri 2:00 p.m. - 2:15 p.m.
|
If Open Source is to Win, It Must Go Public
(
Contributed Talk
)
>
|
Joshua Tan 🔗 |
Fri 2:15 p.m. - 3:00 p.m.
|
Poster session
|
🔗 |
|
→ Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights ( Poster ) > link | Alexandra Abbas · Celia Waggoner · Justin Olive 🔗 |
|
→ If open source is to win, it must go public ( Poster ) > link | Joshua Tan · Nicholas Vincent · Katherine Elkins · Magnus Sahlgren 🔗 |
|
→ ZKLoRA: Efficient Zero-Knowledge Proofs for LoRA Verification ( Poster ) > link | Bidhan Roy · Peter Potash · Marcos Villagra 🔗 |
|
→ Library X: User-Friendly Differential Privacy Library in PyTorch ( Poster ) > link | Sai Aparna Aketi · Will Bullock · Iden Kalemaj · Enayat Ullah · Huanyu Zhang 🔗 |
|
→ Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks ( Poster ) > link | Isaac Chung · Imene Kerboua · Márton Kardos · Roman Solomatin · Kenneth Enevoldsen 🔗 |
|
→ LIB_NAME: API-first feature extraction for image-based profiling workflows ( Poster ) > link | Ala Muñoz · Tim Treis · Alexandr A. Kalinin · Shatavisha Dasgupta · Fabian Theis · Anne Carpenter · Shantanu Singh 🔗 |
|
→ Write Code that People Want to Use ( Poster ) > link | Stella Biderman · Jennifer Mickel · Baber Abbasi 🔗 |
|
→ Deploying User-Friendly Software: Six Recommendations to Make Single-Cell Foundation Models More Usable For Scientific Discovery ( Poster ) > link | Izumi Ando · Hassaan Maan · Kieran Campbell 🔗 |
|
→ An LLM-Powered Tool for Enhancing Scientific Open-Source Repositories ( Poster ) > link | Nikolay Nikitin · Andrey Getmanov · Zakhar Popov · Ulyanova Alekseevna · Yaroslav Aksenkin · Ilya Sokolov · Alexander Boukhanovsky 🔗 |
|
→ Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms ( Poster ) > link |
14 presentersPhilippe M. Wyder · Judah A Goldfeder · Alexey Yermakov · Yue Zhao · Stefano Riva · Jan Williams · David Zoro · Amy Rude · Matteo Tomasetto · Joe Germany · Joseph Bakarji · Georg Maierhofer · Miles Cranmer · J. Nathan Kutz |
|
→ N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion ( Poster ) > link | Caleb Chin · Aashish Khubchandani · Harshvardhan Maskara · Kyuseong Choi · Jacob Feitelberg · Albert Gong · Manit Paul · Tathagata Sadhukhan · Anish Agarwal · Raaz Dwivedi 🔗 |
|
→ Swizz: One-Liner Figures, LaTeX Tables, and Flexible Layouts for Scientific Papers ( Poster ) > link | Lars Quaedvlieg · Andrea Miele · Caglar Gulcehre 🔗 |
|
→ DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling ( Poster ) > link | Ankita Bisoi · Shreyas Vinaya Sathyanarayana · Jose Siguenza · Bharath Ramsundar 🔗 |
|
→ Open-Source Foosball Benchmark for Deep Reinforcement Learning ( Poster ) > link | Matthew So · Kwansoo Lee · Judah A Goldfeder · Hod Lipson 🔗 |
|
→ RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing ( Poster ) > link | Yiqing Xie · Alex Xie · Divyanshu Sheth · Pengfei Liu · Daniel Fried · Carolyn Rose 🔗 |
|
→ An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models ( Poster ) > link | Pranav Guruprasad · Yangyue Wang · Harsh Sikka 🔗 |
|
→ Meta-World+: An Improved, Standardized, RL Benchmark ( Poster ) > link |
12 presentersReginald McLean · Evangelos Chatzaroulas · Luc McCutcheon · Frank Röder · Tianhe (Kevin) Yu · Zhanpeng He · K.R. Zentner · Ryan Julian · Jordan Terry · Isaac Woungang · Nariman Farsad · Pablo Samuel Castro |
|
→ skglm: Improving scikit-learn for regularized Generalized Linear Models ( Poster ) > link | Mathurin Massias · Badr MOUFAD · Quentin Bertrand 🔗 |
|
→ KernelBot: A Competition Platform for Writing Heterogeneous GPU Code ( Poster ) > link | Alex Zhang · Matej Sirovatka · Erik Schultheis · Benjamin Horowitz · Mark Saroufim 🔗 |
|
→ Orthogonium: A Unified, Efficient Library of Orthogonal and 1‑Lipschitz Building Blocks ( Poster ) > link | Thibaut Boissin · Franck Mamalet · Valentin Lafargue · Mathieu Serrurier 🔗 |
|
→ Gundam: A PyTorch Native Platform for Training Generative AI Models ( Poster ) > link | Tianyu Liu · Wanchao Liang 🔗 |
|
→ Provenance Design and Evolution in a Production ML Library ( Poster ) > link | Adam C Pocock · Joseph Wonsil · Romina Mahinpei · Jack Sullivan · Margo Seltzer 🔗 |
|
→ ABC Gym: a simulation environment for low-bandwidth training ( Poster ) > link | Seth Howes · Matt Beton · Mohamed Baioumy · Alex Cheema 🔗 |
|
→ AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models ( Poster ) > link | Jacob Chmura · Shahrad Mohammadzadeh · Ivan Anokhin · Jacob-Junqi Tian · Mandana Samiei · Taz Scott-Talib · Irina Rish · Doina Precup · Reihaneh Rabbany · Nishanth V Anand 🔗 |
|
→ LUQ: Language Models Uncertainty Quantification Toolkit ( Poster ) > link | Alexander V Nikitin · Martin Trapp · Pekka Marttinen 🔗 |
|
→ DISCO: A Browser-Based Privacy-Preserving Framework for Distributed Collaborative Learning ( Poster ) > link | Julien Vignoud · Martin Jaggi · Mary-Anne Hartley · Tahseen Rabbani · Valérian Rousset 🔗 |
|
→ Reproducible sampling from intractable distributions with Pigeons.jl ( Poster ) > link | Miguel Biron-Lattes · Nikola Surjanovic · Paul Tiede · Saifuddin Syed · Trevor Campbell · Alexandre Bouchard-Côté 🔗 |
|
→ FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems ( Poster ) > link | Val Andrei Fajardo · D. Emerson · Amandeep Singh · Marcelo Lotif · Veronica Chatrath · Izuki Matsuba · Chi Cheung · Ravi Theja Desetty 🔗 |
|
→ DINOHash: Learning Adversarially Robust Perceptual Hashes from Self-Supervised Features ( Poster ) > link | Shree Singhi · Aayush Gupta · Lukas Struppek 🔗 |
|
→ TorchAO: PyTorch-Native Training-to-Serving Model Optimization ( Poster ) > link |
11 presentersAndrew Or · Apurva Jain · Daniel Vega-Myhre · Jesse Cai · Charles Hernandez · Zhenrui Zhang · Driss Guessous · Vasiliy Kuznetsov · Christian Puhrsch · Mark Saroufim · Supriya Rao |
|
→ $\texttt{markovml}$: A Python Package for Verifying Markov Processes with Embedded Machine Learning Models ( Poster ) > link | Muhammad Maaz · Timothy Chan 🔗 |
|
→ Control Flow Operators in PyTorch ( Poster ) > link | Yidi Wu · Thomas Ortner · Edward Yang · Horace He · Yanan Cao 🔗 |
|
→ A2Perf: Benchmarking Autonomous Agents End-to-End in Realistic Domains ( Poster ) > link |
18 presentersIkechukwu Uchendu · Jason Jabbour · Korneel Van den Berghe · Joel Runevic · Matthew Stewart · Jeffrey Ma · Srivatsan Krishnan · Izzeddin Gur · Austin Huang · Colton Bishop · Paige Bailey · Wenjie Jiang · Ebrahim M. Songhori · Sergio Guadarrama · Jie Tan · Jordan Terry · Aleksandra Faust · Vijay Janapa Reddi |
|
→ laplax - Laplace Approximations with JAX ( Poster ) > link | Tobias Weber · Bálint Mucsányi · Lenard Rommel · Thomas Christie · Lars Kasüschke · Marvin Pförtner · Philipp Hennig 🔗 |
|
→ PyLO: Towards Accessible Learned Optimizers in Pytorch ( Poster ) > link | Paul Janson · Benjamin Thérien · Quentin Anthony · Xiaolong Huang · Abhinav Moudgil · Eugene Belilovsky 🔗 |
|
→ SAGDA: Open-Source Synthetic Agriculture Data for Africa ( Poster ) > link | Abdelghani Belgaid · Oumnia Ennaji 🔗 |
|
→ BoFire: Bayesian Optimization Framework Intended for Real Experiments ( Poster ) > link |
18 presentersJohannes P. Dürholt · Thomas Asche · Johanna Kleinekorte · Gabriel Mancino-Ball · Benjamin Schiller · Simon Sung · Julian Keupp · Aaron P. Osburg · Toby Boyne · Ruth Misener · Rosona Eldred · Chrysoula Kappatou · Robert Lee · Dominik Linzner · Wagner Steuer Costa · David Walz · Niklas Wulkow · Behrang Shafei |
|
→ Spatial Reasoning over Continuous Variables with PySpaRe ( Poster ) > link | Bartlomiej Pogodzinski · Christopher Wewer · Bernt Schiele · Jan Eric Lenssen 🔗 |
|
→ OpenDG: A Modular Framework for Machine Learning on Dynamic Graphs ( Poster ) > link | Jacob Chmura · Shenyang (Andy) Huang · Ali Parviz · Farimah Poursafaei · Michael Bronstein · Guillaume Rabusseau · Matthias Fey · Reihaneh Rabbany 🔗 |
|
→ M(M)ORE : Massive Multimodal Open RAG & Extraction ( Poster ) > link | Alexandre Sallinen · Stefan Krsteski · Paul Teiletche · Allard Marc-Antoine · Baptiste Lecoeur · Michael Zhang · Fabrice Nemo · David Kalajdzic · Matthias Meyer · Mary-Anne Hartley 🔗 |
|
→ Liger-Kernel: Efficient Triton Kernels for LLM Training ( Poster ) > link |
11 presentersPin-Lun Hsu · Yun Dai · Vignesh Kothapalli · Qingquan Song · Shao Tang · Siyu Zhu · Steven Shimizu · Shivam Sahni · Haowen Ning · Yanning Chen · Zhipeng Wang |
|
→ Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization ( Poster ) > link | Leonard Papenmeier · Luigi Nardi 🔗 |
|
→ olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models ( Poster ) > link | Jake Poznanski · Aman Rangapur · Jon Borchardt · Jason Dunkelberger · Christopher Wilhelm · Kyle Lo · Luca Soldaini 🔗 |
|
→ Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions ( Poster ) > link | Jhih-Yi Hsieh · Aditi Raghunathan · Nihar Shah 🔗 |
Fri 3:00 p.m. - 3:30 p.m.
|
☕ Coffee break
|
🔗 |
Fri 3:30 p.m. - 4:00 p.m.
|
Lessons from the Trenches on Reproducible Evaluation of Language Models
(
Invited Talk: Stella Biderman
)
>
|
Stella Biderman 🔗 |
Fri 4:00 p.m. - 4:55 p.m.
|
Panel discussion
|
Evan Shelhamer · Matthew Johnson · Stella Biderman · Tri Dao 🔗 |
Fri 4:55 p.m. - 5:00 p.m.
|
👋 Closing remarks
|
Alina Selega · Frank Schneider · Geoff Pleiss · Jonathan Wenger · Jūlija Pečerska 🔗 |