Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
Adaptive Backbone Selection for Efficient and Real-Time Vision Inference
Syed Amir Hamza · Alexander Jesser
Modern vision assistants often rely on large, static backbones regardless of input complexity, leading to unnecessary energy use and latency—especially on edge devices. We introduce Adaptive Backbone Selection (ABS), a dynamic inference framework that selects the most appropriate CNN backbone for each image in real-time. ABS integrates a lightweight complexity analyzer (based on edge and texture richness) and a policy network, trained via reinforcement learning, that learns to dynamically balance accuracy and latency through a custom reward function. To mitigate switching overhead, a memory-efficient Backbone Manager with LRU caching handles model reuse. Evaluated on ImageNet, ABS establishes a new, superior operating point on the accuracy-efficiency frontier, achieving higher accuracy than strong baselines like DenseNet121 at a fraction of the computational cost. Our work presents a practical and deployable system for building more sustainable and responsive real-time AI.