Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Proof-of-Concept for Private Local-to-Cloud LLM Chat via Trusted Execution Environments

Avanika Narayan · Dan Biderman · Christopher Re


Abstract:

Cloud-based LLM assistants pass every prompt through cloud servers in plaintext, leaving personal information open to inspection by cloud providers and any malicious actors with access to their servers.Current privacy techniques either degrade quality or are several orders of magnitude slower.In contrast, Trusted Execution Environments (TEEs) offer a practical path forward, taking a hardware-based approach. We explore recent TEE-based virtual machines with confidential NVIDIA H100 and AMD SEV-SNP CPUs. Naive Pytorch use inside this TEE incurs a 1.87× slowdown due to CPU-GPU encryptions. Moreover, there is a lack of open-source communication protocols between a local client and such a remote TEE.In response, we propose TEEChat, a research prototype that (1) binds a local client to a remote TEE hosting an LLM, via attestation and key exchange, (2) secures communication with full end-to-end encryption, and (3) minimizes overhead with targeted kernel and I/O optimizations. For models over 30B parameters, TEEChat adds just 1\% latency—showing that LLM inference inside TEEs is already practical.

Chat is not available.