Skip to yearly menu bar Skip to main content


Poster

Communicating Activations Between Language Model Agents

Vignav Ramesh · Kenneth Li

East Exhibition Hall A-B #E-2604
[ ] [ ]
Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Communication between multiple language model (LM) agents has been shown to scale up the reasoning ability of LMs. While natural language has been the dominant medium for inter-LM communication, it is not obvious this should be the standard: not only does natural language communication incur high inference costs that scale quickly with the number of both agents and messages, but also the decoding process abstracts away too much rich information that could be otherwise accessed from the internal activations. In this work, we propose a simple technique whereby LMs communicate via *activations*; concretely, we pause an LM $B$'s computation at an intermediate layer, combine its current activation with another LM $A$'s intermediate activation via some function $f$, then pass $f$'s output into the next layer of $B$ and continue the forward pass till decoding is complete. This approach scales up LMs on new tasks with *zero* additional parameters and data, and saves a *substantial amount of compute* over natural language communication. We test our method with various functional forms $f$ on two experimental setups—multi-player coordination games and reasoning benchmarks—and find that it achieves up to $27$% improvement over natural language communication across datasets with $<$$1/4$ the compute, illustrating the superiority and robustness of activations as an alternative "language" for communication between LMs.

Lay Summary:

Large language models can better reason through hard problems when multiple instances of the model (called “agents”) think through diverse approaches and communicate with each other, i.e. send each other messages in plain text. We wondered if LLM agents could communicate more efficiently and effectively by tapping into each other’s internal “thoughts” – a.k.a. the “activation vectors” produced as a model computationally processes a prompt. Our method pauses one model mid-computation, merges its activation with another model’s, and then continues processing. Crucially, this requires no extra training data or new model parameters. We evaluated our technique on multi-player coordination tasks and reasoning benchmarks, seeing a 27% boost in performance compared to text-based communication. Even better, this requires less than a fourth of the compute. These findings suggest that by sharing rich internal signals instead of words, LLM agents can collaborate far faster and more efficiently.

Chat is not available.