← Back to Learn
agent-safetydeploymentbest-practices

Supply Chain Security for AI Model Weights

Authensor

AI model weights are the core artifact that determines an agent's behavior. If an attacker can modify model weights at any point in the supply chain (during training, storage, distribution, or deployment), they can alter the agent's behavior in ways that bypass all other safety controls. Supply chain security for model weights protects this critical artifact from tampering.

The Threat

A tampered model might behave normally on standard benchmarks but contain a backdoor that activates on specific trigger inputs. It might subtly degrade safety classifier accuracy, allowing more harmful content through. Or it might exfiltrate data through steganographic channels in its outputs.

These attacks are particularly dangerous because they are invisible to operators who test the model on standard evaluations. The model passes all checks until the trigger condition is met.

Provenance Tracking

Track the origin and history of every model artifact:

  • Who trained the model and when?
  • What training data was used?
  • What training infrastructure was involved?
  • Who has had access to the weights since training?
  • What modifications (fine-tuning, quantization, pruning) have been applied?

Store provenance metadata alongside the model weights. Use a tamper-evident format so that provenance cannot be altered without detection.

Integrity Verification

Compute cryptographic hashes of model weights at every stage of the pipeline. Verify hashes when weights are loaded for inference. Any mismatch indicates tampering or corruption.

model_manifest:
  name: "safety-classifier-v3"
  hash: "sha256:a1b2c3d4..."
  signed_by: "training-pipeline@company.com"
  signature: "..."
  training_date: "2025-12-15"
  training_data_hash: "sha256:e5f6g7h8..."

Secure Distribution

Distribute model weights through authenticated channels. Use signed URLs, checksum verification, and TLS for transit security. When pulling models from public registries, verify the publisher's identity and the model's signature before deployment.

Access Controls

Restrict who can write to model storage. Use separate credentials for training pipelines (write access) and inference runtimes (read-only access). Log all access to model storage.

Runtime Verification

At inference startup, verify model weight hashes against the expected values from the manifest. If verification fails, refuse to start the agent. Authensor's health check patterns can include model integrity verification as a startup check.

Model weights are code. Treat their supply chain with the same rigor you apply to software supply chain security.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides