Skip to yearly menu bar Skip to main content


Poster with Prerecorded Video
in
Workshop: Tokenization Workshop (TokShop)

Tokenisation is NP-Complete

Philip Whittington · Gregor Bachmann · Tiago Pimentel

Keywords: [ Compression ] [ Computational Complexity ] [ Tokenisation ]

[ ] [ Project Page ]
Fri 18 Jul 1:50 p.m. PDT — 3 p.m. PDT

Abstract: In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $\delta$ symbols by either finding a vocabulary directly (_direct_ tokenisation), or selecting a sequence of merge operations (_bottom-up_ tokenisation).

Chat is not available.