Skip to yearly menu bar Skip to main content


Poster

Position: Machine Learning Models Have a Supply Chain Problem

Sarah Meiklejohn · Hayden Blauzvern · Mihai Maruseac · Spencer Schrock · Laurent Simon · Ilia Shumailov

East Exhibition Hall A-B #E-505
[ ] [ ] [ Project Page ]
Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.

Lay Summary:

Powerful machine learning models are now easy to find and download online. This is exciting because people who aren't tech experts or don't have powerful computers can still use them. On the other hand, sharing these models so openly comes with downsides. Specifically, our paper highlights how there are big security risks that come from downloading and using models we don't know much about, similar to downloading and using computer programs from unsafe sources. For example, someone could replace a good model with a harmful one (like a computer virus), or might have built their model in an unsafe way or using data that is bad or that they weren't supposed to use.We suggest using something called Sigstore to make this process safer. Sigstore adds transparency to models by helping people digitally "sign" the models they create in a way that proves they're legitimate and shows where the data that was used to create the model came from.

Chat is not available.