← Back to Learn
content-safetymonitoringreference

Benchmarking AI Safety Scanner Performance

Authensor

A content safety scanner that is accurate but slow, or fast but inaccurate, is not suitable for production. Benchmarking measures both detection quality and operational performance, giving you the data to make informed decisions about scanner configuration and deployment.

Detection Benchmarks

Dataset

Build a benchmark dataset of labeled examples:

  • True positives: Known malicious inputs (prompt injections, data exfiltration attempts, harmful content) that should be flagged
  • True negatives: Known benign inputs that should pass
  • Edge cases: Inputs that are borderline or that test specific detection rules

A useful benchmark dataset has at least 500 examples with balanced representation of categories.

Metrics

Precision: Of inputs flagged as malicious, what proportion is actually malicious? Low precision means high false positive rate.

Recall: Of actually malicious inputs, what proportion is flagged? Low recall means threats are slipping through.

F1 Score: The harmonic mean of precision and recall. A single number that balances both concerns.

Dataset: 1000 examples (200 malicious, 800 benign)
Flagged: 210 (190 true positive, 20 false positive)
Missed: 10 (malicious but not flagged)

Precision: 190/210 = 0.905
Recall: 190/200 = 0.950
F1: 2 * (0.905 * 0.950) / (0.905 + 0.950) = 0.927

Performance Benchmarks

Throughput

Measure scans per second at various concurrency levels. Run the scanner against a stream of inputs at increasing rates until throughput plateaus.

Latency

Measure P50, P95, and P99 scan latency. Track how latency varies with input size. A scanner that handles short messages in 1ms but takes 500ms for long documents needs different deployment strategies for each case.

Resource Usage

Measure CPU and memory consumption during sustained scanning. This determines the infrastructure cost of running the scanner at production scale.

Benchmarking Aegis

Aegis is designed for zero-dependency, in-process scanning. Benchmark it within the control plane process to measure realistic performance including overhead from integration:

const inputs = loadBenchmarkDataset();
const start = performance.now();
for (const input of inputs) {
  aegis.scan(input);
}
const elapsed = performance.now() - start;
const throughput = inputs.length / (elapsed / 1000);

Comparative Benchmarking

When evaluating multiple scanners or scanner configurations, benchmark them against the same dataset and on the same hardware. Normalize results for fair comparison.

Regular Rebenchmarking

Benchmark after every scanner update, rule change, or infrastructure change. Performance characteristics change over time, and stale benchmarks lead to incorrect capacity planning.

Benchmark data replaces guesswork. Make scanner decisions based on measured performance, not assumptions.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides