Poster
FeatSharp: Your Vision Model Features, Sharper
Mike Ranzinger · Greg Heinrich · Pavlo Molchanov · Bryan Catanzaro · Andrew Tao
West Exhibition Hall B2-B3 #W-304
Modern computer vision models, while powerful, often lack the ability to process high resolution images, or are only able to produce representations in low resolution. This makes them challenging to use for tasks that require high-res, such as detecting small objects in an image (e.g. find the bird flying in the sky), or labeling every pixel within the scene as some category (e.g. "bird", "sky", "tree", etc.). We present a method for enabling low-res-only vision models to produce hi-res representations by carefully upsampling them, combined with additional passes through the model (called tiling) to get details for small objects that are otherwise not large enough to be encoded properly. In doing so, we demonstrate that we can improve on various dense task benchmarks for numerous base vision models.