Pattern 1: Prompt-Level Guardrails
The most basic form of guardrails involves using system prompts to tell the model what it can and cannot do. For example: “You are an assistant. You are NOT allowed to share pricing information or delete files.”
The Risk: Prompt-level guardrails are suggestions, not controls. They are highly vulnerable to prompt injection and model drift. They should never be the only layer of security for a production agent.
Pattern 2: Semantic Input/Output Filtering
This pattern uses a second, smaller model (a “moderator” or “classifier”) to inspect every input and output for policy violations. It can detect PII, toxic content, or instructions that look like injection attacks.
The Benefit: It provides a dynamic layer of protection that doesn’t rely on simple keyword matching.
Pattern 3: Runtime Tool Interception
This is the most robust pattern, and it’s the core of the Shield Control architecture. Every time an agent tries to use a tool (like calling an API or reading a file), the request is intercepted at the infrastructure layer.
The security platform validates the tool call against a central policy engine. If the action is unauthorized—such as “transfer all customer data to a personal email”—it is blocked before it ever executes.
Pattern 4: Human-in-the-Loop (HITL)
For high-stakes actions, the guardrail is a human. The agent can plan the action but must pause and wait for explicit authorization from an operator before proceeding. This is critical for financial transactions, data deletions, or external communications.