Poster
in
Affinity Workshop: New In ML
DREAM: Dual Refinement Embedding Attention Model
Ashutosh Pandey · Ayush Anand · Prarabdha Srivastava · Anil Singh Parihar
Recent advancements in computer vision have re-lied heavily on pre-trained vision encoders andsophisticated attention mechanisms. However,different vision foundation models often attend tosemantically distinct aspects of an image. Thesedifferences arise from variations in training strate-gies and the nature of the data used, resulting indistinct inductive biases across models. In thiswork, we propose DREAM (Dual RefinementEmbedding Attention Model), a simple yet effec-tive architecture for fusing embeddings from twosemantically diverse vision encoders. Crucially,our method avoids costly pretraining and oper-ates directly on frozen encoder output, making itcomputationally efficient. We evaluated DREAMon several standard vision benchmarks, includingCIFAR-10, CIFAR-100, UCF101, and Kinetics-400. Across all datasets, DREAM consistentlyoutperforms the other methods, demonstrating en-hanced accuracy and robustness. These findingshighlight the effectiveness of combining semanti-cally complementary representations from diversevision encoders, offering a resource-efficient path-way to improved visual understanding withoutthe need for large-scale data or computationallyintensive training.