Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures

Evaluating Vulnerabilities in Multi-Agent Systems through Text-Based Attacks

Dur e Najaf Amjad


Abstract:

Multi-agent systems (MAS) are increasingly deployed in complex applications but remain vulnerable to adversarial inputs. We evaluate the resilience of a hierarchical Customer Service MAS framework built with Crew AI, in which a central manager delegates queries to multiple agents. We conduct four sets of experiments: (i) measure the raw response of the MAS to adversarial queries and the effectiveness of the central manager in filtering them; (ii) integrate an external resource (a PDF document) into one agent and assess its impact on adversarial robustness; (iii) attempt a jailbreak by corrupting an agent’s memory; and (iv) designate the LLM that produces the fewest unethical responses as a guardrail to filter adversarial queries before they reach the MAS. Our results identify Claude 3.5 Sonnet as the most ethical LLM, permitting at most 3 out of 50 adversarial queries to pass in every trial. Additionally, while the jailbreak remained unsuccessful, augmenting weaker models such as gpt-3.5-turbo-0125 with an external document substantially reduces their rate of harmful outputs. These experiments validate the effectiveness and generalizability of our hierarchical guardrail strategies.

Chat is not available.