Poster
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Gwen Yidou-Weng · Benjie Wang · Guy Van den Broeck
East Exhibition Hall A-B #E-1309
As large language models (LMs) advance, there is an increasing need to control their outputs to align with human values (e.g., detoxification) or desired attributes (e.g., personalization, topic). However, autoregressive models focus on next-token predictions and struggle with global properties that require looking ahead. Existing solutions either post-train LMs for each new attribute—expensive and inflexible—or approximate the Expected Attribute Probability (EAP) of future sequences by sampling or training, which is slow and unreliable for rare attributes. We introduce TRACE (Tractable Probabilistic Reasoning for Adaptable Controllable gEneration), a novel framework that efficiently computes EAP and adapts to new attributes through tractable probabilistic reasoning and lightweight control. TRACE distills a Hidden Markov Model (HMM) from an LM and pairs it with a small classifier to estimate attribute probabilities, enabling exact EAP computation over the HMM’s predicted futures. This EAP is then used to reweigh the LM’s next-token probabilities for globally compliant continuations. Empirically, TRACE achieves state-of-the-art detoxification results with only 20% decoding overhead, yields 76 low-resource personalized LMs within seconds, and seamlessly extends to composite attributes.
AI language models are powerful, but getting them to follow rules can be tricky. How do you make sure an AI stays polite, or create a chatbot that sounds like Taylor Swift? Current methods for controlling AI are often like having to rewire an entire skyscraper just to change the lightbulb in one office—they're slow, expensive, and impractical for each new task.We developed TRACE, a new technique that acts like a fast, simple "crystal ball" for the AI. At every word it writes, TRACE uses a simplified map of language to peek into thousands of potential future sentences. It checks the odds that a sentence will break a rule (like "be non-toxic") and uses that foresight to guide the AI's word choices in the present.This approach works. TRACE sets a new standard for preventing toxic language with very little slowdown. And because it's so adaptable, you could teach it a new personality in seconds—letting you finally create that Taylor Swift bot that actually sounds like her. It can even combine complex rules, like asking for a political speech that is also strictly non-toxic.