Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
Proof-of-Concept for Private Local-to-Cloud LLM Chat via Trusted Execution Environments
Avanika Narayan · Dan Biderman · Christopher Re
Cloud-based LLM assistants pass every prompt through cloud servers in plaintext, leaving personal information open to inspection by cloud providers and any malicious actors with access to their servers.Current privacy techniques either degrade quality or are several orders of magnitude slower.In contrast, Trusted Execution Environments (TEEs) offer a practical path forward, taking a hardware-based approach. We explore recent TEE-based virtual machines with confidential NVIDIA H100 and AMD SEV-SNP CPUs. Naive Pytorch use inside this TEE incurs a 1.87× slowdown due to CPU-GPU encryptions. Moreover, there is a lack of open-source communication protocols between a local client and such a remote TEE.In response, we propose TEEChat, a research prototype that (1) binds a local client to a remote TEE hosting an LLM, via attestation and key exchange, (2) secures communication with full end-to-end encryption, and (3) minimizes overhead with targeted kernel and I/O optimizations. For models over 30B parameters, TEEChat adds just 1\% latency—showing that LLM inference inside TEEs is already practical.