Poster
EncryptedLLM: Privacy-Preserving Large Language Model Inference via GPU-Accelerated Fully Homomorphic Encryption
Leo de Castro · Daniel Escudero · Adya Agrawal · Antigoni Polychroniadou · Manuela Veloso
East Exhibition Hall A-B #E-1008
Large language models (LLMs) are typically deployed in cloud environments. To use these models, the user's data must be sent to an external cloud machine. For sensitive queries (e.g., topics related to healthcare or finance), this represents a major security concern. This work improves the efficiency of techniques to privately evaluate models over sensitive queries. This allows users to safely send their query to a cloud machine and receive the model output without allowing the cloud to learn anything about their data. The main underlying tool is an advanced cryptography primitive called fully homomorphic encryption (FHE), and a technical contribution of this work is a new GPU-accelerated implementation of FHE. We also develop methods to evaluate LLMs using FHE while preserving the quality of the model outputs.