168+ repos. 350+ verified vulnerabilities. 126 responsible disclosures. Offensive security methodology applied to the entire AI/ML ecosystem -- from PyTorch core to production inference servers to safety evaluation frameworks.
Automated adversarial analysis pipeline. Covers deserialization, injection, auth bypass, model format exploits, native code, and supply chain. Same methodology behind 126 formal disclosures to NVIDIA, Microsoft, Meta, Google, and 50+ other orgs.
The tools we built for our own research, released under MIT. Content scanner, behavioral monitor, policy engine, MCP gateway, red team harness. Six packages. All free. Used daily in our own engagements.
Define the attack surface. Which safety mechanisms? Which threat model? What constitutes a bypass?
Systematic adversarial testing. Automated seed attacks, manual exploitation, edge case fuzzing.
Reproduction steps for every finding. Severity classification. Root cause analysis.
Concrete recommendations. Where possible, we ship the fix as open-source tooling.
Major safety frameworks with confirmed vulnerabilities
Vulnerabilities identified across 12 frameworks
Adversarial agent trials across 5 frontier models
Preprints published
Began adversarial testing of AI safety evaluation frameworks. First confirmed vulnerabilities.
VULN-0001 filed. Confirmed vulnerabilities in 7 of 10 major AI safety evaluation frameworks. Coordinated disclosure.
ControlArena PR #798 accepted by UK AISI. Compound judge research published. Aegis content scanner released.
Full safety stack open-sourced under MIT. Policy engine, Sentinel monitor, MCP Gateway, Chainbreaker red team harness.
Systematic audit of 168+ AI/ML repositories. 350+ verified vulnerabilities. 126 responsible disclosures. Two novel vulnerability classes discovered. Coordinated disclosure in progress.
Penetration tester turned AI safety researcher. Founded 15 Research Lab to apply offensive security methodology to AI safety evaluation. Built Authensor to operationalize those findings as open-source tools.
Background in adversarial testing. Focus on safety evaluation gaps, compound judge failures, and guardrail bypass techniques. ControlArena contributor. VULN-0001 author.
Red team your AI safety systems. Or download the free stack and use it yourself.