In multi-agent systems, agents pass messages, data, and task delegations to each other. Every inter-agent message is a potential attack vector. A compromised agent can use its communication channel to inject instructions into other agents, escalate privileges, or propagate failures.
Agents in a multi-agent system often trust each other implicitly. Agent A sends a summary to Agent B, and Agent B processes it without scanning. If Agent A is compromised (through prompt injection via a retrieved document), its output may contain:
Treat messages from other agents like messages from untrusted users. Scan them for injection patterns before processing:
// Agent B receives a message from Agent A
const message = receiveFromAgent('agent-a');
const scan = aegis.scan(message.content);
if (scan.threats.length > 0) {
log.warn('Threat in inter-agent message', {
from: 'agent-a',
threats: scan.threats,
});
rejectMessage(message);
}
Agents should authenticate each other. Without authentication, any process that can send messages on the network can impersonate an agent.
Each agent should have its own policy that limits what it can do, regardless of what other agents request:
# Agent B policy: only performs data analysis
rules:
- tool: "data.analyze"
action: allow
- tool: "data.read"
action: allow
- tool: "*"
action: block
reason: "Agent B is limited to data analysis"
Even if Agent A (compromised) tells Agent B to send an email, Agent B's policy blocks it because email tools are not in its allowlist.
Define strict schemas for inter-agent messages. Reject messages that do not conform to the schema:
const schema = {
type: 'object',
properties: {
task: { type: 'string', enum: ['analyze', 'summarize'] },
data: { type: 'object' },
traceId: { type: 'string' },
},
required: ['task', 'data', 'traceId'],
additionalProperties: false,
};
Free-form text fields are injection vectors. Structured, typed messages are harder to exploit.
Link all agent actions in a multi-agent workflow using a shared trace ID. When an incident occurs, the trace shows the full path across agents, making it possible to identify which agent was the entry point for the attack.
Limit how many messages one agent can send to another:
A compromised agent that cannot send unlimited messages to other agents has limited ability to propagate the attack.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides