Poster
Compositional Generalization via Forced Rendering of Disentangled Latents
Qiyao Liang · Daoyuan Qian · Liu Ziyin · Ila R. Fiete
East Exhibition Hall A-B #E-2903
Humans effortlessly mix a few simple pieces—like words or shapes—to create endless new ideas, but computers usually must see every example to learn. To investigate, we asked an AI to draw a single shaded dot on a blank grid anywhere it was told, except we hid the center area during training. Even when we gave the exact "x" and "y" instructions, the AI simply stitched together bits of remembered examples instead of learning the underlying rule for placing the dot. Then we tried two small tweaks: one that makes the AI paint each instruction directly onto the final grid, and another that first teaches it simple horizontal and vertical lines. With either tweak, the AI truly learned to combine the two directions and instantly filled in the missing center—using far fewer examples. This shows that grounding each piece of information right where the AI acts can help future systems flexibly recombine known elements—whether for new word combinations, object layouts, or routes—without needing to relearn every possibility.