Poster
QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles for model-monitoring
Nikhil Pratap Ghanathe · Steve Wilton
East Exhibition Hall A-B #E-1703
Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed remotely without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized tinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.
Tiny AI models, known as TinyML, are being used in everything from wearable health monitors to environmental sensors. These models run directly on low-power devices — often no bigger than a coin — and help make fast decisions without needing to send data to the cloud. But there’s a problem: how do we know when these tiny models are making a mistake, especially when they’re deployed in remote or hard-to-reach places with no cloud access or human supervision?That’s where QUTE comes in. QUTE is a new technique we’ve developed to help tiny AI models recognize when they’re uncertain — in other words, when their predictions might be wrong. This ability to estimate confidence (called uncertainty quantification) is crucial for safety, reliability, and smart decision-making in the field. Unlike other methods that are too large or slow for tiny devices, QUTE is designed to work efficiently on ultra-low-power hardware.It works by adding small "checkpoints" inside the model and training lightweight prediction blocks that each learn something different. Together, they act like a mini-team of experts that can double-check each other’s confidence and flag when the model might be unsure — all without taking up much space or time.In real-world tests, QUTE made decisions faster (31% lower delay), used less memory (59% smaller), and was better at spotting problems compared to previous approaches. This makes QUTE a strong choice for making AI on tiny devices both faster and more trustworthy.