Skip to yearly menu bar Skip to main content


Oral presentation
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models

Tina Behnia · Puneesh Deora · Christos Thrampoulidis

[ ]
Sat 19 Jul 12:15 p.m. PDT — 12:30 p.m. PDT

Abstract:

Large language models learn both statistical patterns that make text fluent and factual associations between specific tokens that represent knowledge information. The complexity of natural language interweaving linguistic patterns and factual content challenges a systematic study of this capability. To address this, we introduce a Small-Scale Data Model (SSDM) designed to disentangle these components. The SSDM consists of a statistical stream of generic tokens, endowed with designated positional information, which composes with a separate factual stream of source-target token pairs representing knowledge. Partitioning the generating distribution of the statistical stream into sub-distributions, which we term templates, allows us to: (i) Independently vary the format of the templates (i.e., contextual structure) and the frequency with which facts appear within each template during training (i.e., contextual diversity); (ii) Measure both in-distribution and out-of-distribution generalization; and (iii) Distinguish between statistical, structural, and factual aspects of language model generalization. We demonstrate the flexibility of the SSDM by reporting example findings concerning: (a) the potentially catastrophic impact of low contextual diversity on either factual recall, statistical generalization, or both, contingent on the contextual structure; (b) observed stage-wise learning dynamics; and (c) hallucination.

Chat is not available.