← Back to Learn
deploymentguardrailsbest-practicestutorial

Deploying AI Safety in Kubernetes

Authensor

Kubernetes is the natural home for AI safety infrastructure that needs to scale with your agent fleet. Authensor's control plane, policy engine, and monitoring components map cleanly to Kubernetes primitives. This guide covers the deployment architecture.

Architecture Overview

Deploy Authensor's control plane as a Deployment with a horizontal pod autoscaler. The control plane is stateless (all state lives in PostgreSQL), so it scales horizontally without coordination. Set resource requests based on your expected request volume. A single pod handles roughly 2,000 policy evaluations per second.

PostgreSQL runs as a StatefulSet or, preferably, as a managed database service outside the cluster. Audit receipt chains require durable storage with strong consistency guarantees that managed databases handle better than in-cluster PostgreSQL.

Kubernetes Manifests

Create a dedicated namespace for your safety infrastructure. This provides isolation and makes RBAC configuration straightforward.

The control plane Deployment needs environment variables for database connection, API keys, and feature flags like AUTHENSOR_AEGIS_ENABLED and AUTHENSOR_SENTINEL_ENABLED. Store these in a Secret resource.

Expose the control plane through a ClusterIP Service for internal traffic or a LoadBalancer Service if agents run outside the cluster. Use network policies to restrict which pods can reach the safety API.

Health Checks and Reliability

Configure liveness and readiness probes against the control plane's health endpoint. Set the readiness probe with a shorter interval since you want unhealthy pods removed from the service quickly.

Use a PodDisruptionBudget to ensure at least two replicas remain available during node drains and upgrades. Safety infrastructure should never have downtime.

Scaling Considerations

The policy engine is CPU-bound. Aegis content scanning is the heaviest operation. If you use ML-based detection, consider running Aegis as a separate deployment with GPU-enabled nodes while keeping the policy engine on standard compute.

Set autoscaling thresholds based on request latency rather than CPU utilization. A p95 latency target of 50 milliseconds for policy evaluation is a reasonable starting point.

Monitor pod restart counts and OOM kills. Safety infrastructure failures should trigger immediate alerts through your existing Kubernetes monitoring stack.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides