ICML Evaluating Vulnerabilities in Multi-Agent Systems through Text-Based Attacks

Poster
in
Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures

Evaluating Vulnerabilities in Multi-Agent Systems through Text-Based Attacks

Dur e Najaf Amjad

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Multi-agent systems (MAS) are increasingly deployed in complex applications but remain vulnerable to adversarial inputs. We evaluate the resilience of a hierarchical Customer Service MAS framework built with Crew AI, in which a central manager delegates queries to multiple agents. We conduct four sets of experiments: (i) measure the raw response of the MAS to adversarial queries and the effectiveness of the central manager in filtering them; (ii) integrate an external resource (a PDF document) into one agent and assess its impact on adversarial robustness; (iii) attempt a jailbreak by corrupting an agent’s memory; and (iv) designate the LLM that produces the fewest unethical responses as a guardrail to filter adversarial queries before they reach the MAS. Our results identify Claude 3.5 Sonnet as the most ethical LLM, permitting at most 3 out of 50 adversarial queries to pass in every trial. Additionally, while the jailbreak remained unsuccessful, augmenting weaker models such as gpt-3.5-turbo-0125 with an external document substantially reduces their rate of harmful outputs. These experiments validate the effectiveness and generalizability of our hierarchical guardrail strategies.

Chat is not available.

Poster in Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures

Evaluating Vulnerabilities in Multi-Agent Systems through Text-Based Attacks

Dur e Najaf Amjad

Poster
in
Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures