Gate every agent action, scan every input, and keep a tamper-evident audit trail. Open-source, self-hostable, and hardened by frontier adversarial research across the AI/ML ecosystem.
Offensive security methodology applied to AI safety evaluation. Find the gaps. Document them. Ship the fix.
NVIDIA · Microsoft · Meta · Google
HuggingFace · OpenAI · PyTorch · vLLM · Ray
Adversarial analysis across NVIDIA, Microsoft, Meta, Google, HuggingFace, OpenAI, and dozens of other production ML stacks.
Previously undocumented classes in model serialization formats and sandboxed code execution. Details pending coordinated disclosure.
Critical and high-severity findings in PyTorch, DeepSpeed, Ray, Ollama, vLLM, and LangChain, coordinated with maintainers.
Merged PR #798 to UK AISI's ControlArena: agents evading their own safety monitors via prompt injection.
The pipeline we point at the AI/ML ecosystem, pointed at your systems. Scoped engagements, CVE-quality findings, full reproduction steps.
Adversarial analysis of your ML stack — deserialization, injection, auth bypass, model-format exploits, supply chain. The methodology we use to break production systems at NVIDIA, Microsoft, Meta, and Google, pointed at yours.
We test the evaluators. Monitor bypass, compound-judge failures, signal dilution, sandbox escapes. Your safety infrastructure is an attack surface — we prove it before an adversarial agent does.
Systematic adversarial campaigns against your agents and tool integrations. Privilege escalation, exfiltration, goal hijacking, memory poisoning. CVE-quality findings with reproduction steps.
Free, MIT-licensed packages forged in our own adversarial research. Download the full stack, or install only the tools you need.
npx @authensor/create-authensor my-agentcopy@authensor/aegis↗Content safety scanner. 210+ detection rules. Prompt injection, memory poisoning, PII, credential leaks. Zero dependencies, sub-ms latency.
@authensor/sentinel↗Real-time behavioral monitor. EWMA/CUSUM anomaly detection. Per-agent baselines, deny rate tracking, chain depth alerts.
@authensor/engineDeclarative policy evaluation. Session forbidden sequences, budget enforcement, constraint checking. Synchronous, pure, zero dependencies.
@authensor/mcp-serverTransparent policy proxy for any MCP server. Implements SEP authorization protocol. Drop-in protection for Claude Desktop and any MCP client.
@authensor/redteam↗Adversarial red team harness. 15 attack seeds mapped to MITRE ATT&CK. Automated safety regression testing.
@authensor/safeclaw↗Local agent gating for Claude Code. Browser dashboard, approval workflows, audit ledger. One command install.
Wrap any agent action with guard() and policy evaluation, content scanning, and audit logging happen automatically.
# Download the full safety stack npx @authensor/create-authensor my-agent cd my-agent && npm install # Or install individual tools: npm install @authensor/aegis # Content scanner npm install @authensor/sentinel # Behavioral monitor npm install @authensor/engine # Policy engine npm install @authensor/mcp-server # MCP Gateway npm install @authensor/redteam # Red team harness
We apply offensive security methodology to AI safety evaluation. Penetration testing for guardrails. Red teaming for agents. Adversarial probing for classifiers.
The safety stack is our toolkit, open-sourced. The red teaming is what we do with it.
195+ repos audited. 350+ verified vulnerabilities. 166 responsible disclosures. 2 novel vulnerability classes discovered. ControlArena contributor (UK AISI).
Download the free safety stack. Or hire the team that built it to red team your systems.