LLM Gateway Policies
Igris policies are ordered rule lists evaluated on every LLM request. The first matching rule wins.
Rules can allow, deny, alert, rate-limit, guard token counts, and filter content.
LLM policy rules share the same PolicyRule shape as MCP governance rules. The target.kind field
is the discriminator — llm_model and llm_endpoint are the LLM-specific variants.
PolicyRule shape
interface PolicyRule {
target: PolicyRuleTarget;
action: "allow" | "deny" | "alert";
conditions?: Record<string, unknown>;
limit?: PolicyRuleLimit;
tokenGuard?: {
maxInputTokens?: number; // reject if estimated input tokens exceeds this
maxOutputTokens?: number; // cap the max_tokens field in the request
maxRequestMaxTokens?: number; // reject if the request's max_tokens exceeds this
};
contentGuard?: {
piiPatterns?: string[]; // regex patterns — match = PII detected
keywordBlocklist?: string[]; // literal or glob strings
denyOnMatch: boolean; // true = deny, false = alert only
};
logContent?: boolean; // persist full prompt + completion to llm_call_bodies
}
Target variants
type PolicyRuleTarget =
// Match a specific LLM model name or glob (e.g. "gpt-4*", "claude-*")
| { kind: "llm_model"; model: string }
// Match a specific LLM endpoint type
| {
kind: "llm_endpoint";
endpoint:
| "chat.completions"
| "embeddings"
| "images.generate"
| "audio.transcriptions"
| "audio.speech"
| "passthrough";
}
// Match an MCP tool call (for MCP governance rules in the same policy)
| { kind: "mcp_tool"; tool: string };
Model glob matching
The model field supports glob patterns with * as a wildcard:
{ "kind": "llm_model", "model": "gpt-4*" } // matches gpt-4o, gpt-4-turbo, gpt-4o-mini ...
{ "kind": "llm_model", "model": "*" } // matches any model (catch-all)
{ "kind": "llm_model", "model": "claude-3-5-*" } // matches claude-3-5-sonnet, claude-3-5-haiku ...
Action
"allow" // permit the request — stop evaluating further rules
"deny" // block the request with HTTP 403 and a structured error body
"alert" // record an alert event, continue processing (request is not blocked)
A policy without a catch-all allow rule at the end is deny-by-default — any model not matched by
an explicit allow rule will be blocked.
Conditions
Conditions gate a rule on request metadata. The gateway evaluates conditions AFTER matching the
target. If conditions are present and don’t match, the rule is skipped and evaluation continues.
{
"conditions": {
"user": "alice@corp.com",
"metadata.role": { "in": ["developer", "admin"] },
"metadata.team": { "neq": "interns" }
}
}
Condition operators
| Operator | Description |
|---|
"value" (direct) | Exact equality |
{ "eq": "value" } | Explicit equality |
{ "neq": "value" } | Not equal |
{ "in": ["a", "b"] } | Value is in list |
{ "nin": ["a", "b"] } | Value is not in list |
Condition key paths
| Path | Source |
|---|
user | X-Igris-User header or body.user |
traceId | X-Igris-Trace-Id header |
metadata.<key> | X-Igris-Metadata-<key> header or JSON blob |
virtualKeySlug | The virtual key slug being used |
Limit (rate limiting)
The limit field adds a rate limit dimension to the rule. Requests that exceed the limit are blocked
with HTTP 429. Limits are tracked per virtual key (or per user if conditions includes a user
match).
interface PolicyRuleLimit {
requests?: number; // max requests in the window
tokens?: number; // max total tokens (input + output) in the window
dollars?: number; // max spend in USD in the window
per: "minute" | "hour" | "day";
}
Example: cap GPT-4o calls for contractors at 10 requests/hour:
{
"target": { "kind": "llm_model", "model": "gpt-4o" },
"action": "allow",
"conditions": { "metadata.role": "contractor" },
"limit": { "requests": 10, "per": "hour" }
}
Limits use Redis-backed sliding window counters. Multiple dimensions can be set simultaneously —
the first exceeded dimension triggers the 429.
Token guards
Token guards operate at request time, before the request is forwarded to the provider. They use
a character-count heuristic (chars / 4) for input estimation, and the actual usage field
from the response for output recording.
{
"target": { "kind": "llm_model", "model": "*" },
"action": "allow",
"tokenGuard": {
"maxInputTokens": 8000,
"maxRequestMaxTokens": 2000
}
}
maxInputTokens — deny if estimated prompt tokens exceed this value
maxOutputTokens — silently cap max_tokens in the forwarded request to this value
maxRequestMaxTokens — deny if the request’s max_tokens field exceeds this value
Content guards
Content guards inspect the prompt text for PII patterns and keyword blocklists.
{
"target": { "kind": "llm_endpoint", "endpoint": "chat.completions" },
"action": "allow",
"contentGuard": {
"piiPatterns": [
"\\b\\d{3}-\\d{2}-\\d{4}\\b",
"\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\b"
],
"keywordBlocklist": ["confidential", "top-secret"],
"denyOnMatch": true
}
}
piiPatterns — regex patterns applied to the full prompt text (case-insensitive)
keywordBlocklist — literal string matching (case-insensitive)
denyOnMatch: true — block the request; false — log an alert only
logContent
When logContent: true is set on a matching rule, the full prompt messages and completion text are
stored in llm_call_bodies and linked to the audit event. This is disabled by default to avoid
storing sensitive data.
{
"target": { "kind": "llm_model", "model": "*" },
"action": "allow",
"logContent": true
}
Enabling logContent stores prompt and completion text verbatim. Ensure this complies with your
data retention and privacy policies before enabling. Consider combining with contentGuard to
redact PII before logging.
Complete policy example
This example demonstrates a comprehensive policy for a production deployment:
{
"name": "Production LLM governance policy",
"virtualKeySlug": "vk_openai_prod",
"rules": [
{
"target": { "kind": "llm_model", "model": "gpt-4o" },
"action": "deny",
"conditions": {
"metadata.tier": { "in": ["free", "trial"] }
}
},
{
"target": { "kind": "llm_model", "model": "gpt-4*" },
"action": "allow",
"conditions": {
"metadata.tier": "enterprise"
},
"limit": {
"tokens": 1000000,
"dollars": 50,
"per": "day"
},
"tokenGuard": {
"maxInputTokens": 32000,
"maxRequestMaxTokens": 4096
},
"logContent": false
},
{
"target": { "kind": "llm_endpoint", "endpoint": "chat.completions" },
"action": "allow",
"limit": {
"requests": 100,
"per": "minute"
},
"contentGuard": {
"piiPatterns": ["\\b\\d{3}-\\d{2}-\\d{4}\\b"],
"keywordBlocklist": ["internal-secret", "confidential"],
"denyOnMatch": false
}
},
{
"target": { "kind": "llm_model", "model": "*" },
"action": "alert"
}
]
}
How this policy works:
- Deny GPT-4o for free/trial tier users
- Allow GPT-4 family for enterprise users with daily spend and token limits
- Allow chat completions for everyone with a rate limit and content audit (alert-only, not deny)
- Alert on any other model (catch-all — no implicit allow, so unknown models are alerted not denied)
Managing policies via SDK
// List all policies
const { data } = await igris.policies.list({ virtualKeySlug: "vk_openai_prod" });
// Create a policy
const policy = await igris.policies.create({
name: "Block GPT-4 for free tier",
virtualKeySlug: "vk_openai_prod",
rules: [
{
target: { kind: "llm_model", model: "gpt-4*" },
action: "deny",
conditions: { "metadata.tier": "free" },
},
{
target: { kind: "llm_model", model: "*" },
action: "allow",
},
],
});
// Update a policy
await igris.policies.update(policy.id, { enabled: false });
// Delete a policy
await igris.policies.delete(policy.id);