Skip to main content

LLM Gateway Policies

Igris policies are ordered rule lists evaluated on every LLM request. The first matching rule wins. Rules can allow, deny, alert, rate-limit, guard token counts, and filter content. LLM policy rules share the same PolicyRule shape as MCP governance rules. The target.kind field is the discriminator — llm_model and llm_endpoint are the LLM-specific variants.

PolicyRule shape

interface PolicyRule {
	target: PolicyRuleTarget;
	action: "allow" | "deny" | "alert";
	conditions?: Record<string, unknown>;
	limit?: PolicyRuleLimit;
	tokenGuard?: {
		maxInputTokens?: number;       // reject if estimated input tokens exceeds this
		maxOutputTokens?: number;      // cap the max_tokens field in the request
		maxRequestMaxTokens?: number;  // reject if the request's max_tokens exceeds this
	};
	contentGuard?: {
		piiPatterns?: string[];        // regex patterns — match = PII detected
		keywordBlocklist?: string[];   // literal or glob strings
		denyOnMatch: boolean;          // true = deny, false = alert only
	};
	logContent?: boolean;             // persist full prompt + completion to llm_call_bodies
}

Target variants

type PolicyRuleTarget =
	// Match a specific LLM model name or glob (e.g. "gpt-4*", "claude-*")
	| { kind: "llm_model"; model: string }

	// Match a specific LLM endpoint type
	| {
			kind: "llm_endpoint";
			endpoint:
				| "chat.completions"
				| "embeddings"
				| "images.generate"
				| "audio.transcriptions"
				| "audio.speech"
				| "passthrough";
	  }

	// Match an MCP tool call (for MCP governance rules in the same policy)
	| { kind: "mcp_tool"; tool: string };

Model glob matching

The model field supports glob patterns with * as a wildcard:
{ "kind": "llm_model", "model": "gpt-4*" }     // matches gpt-4o, gpt-4-turbo, gpt-4o-mini ...
{ "kind": "llm_model", "model": "*" }            // matches any model (catch-all)
{ "kind": "llm_model", "model": "claude-3-5-*" } // matches claude-3-5-sonnet, claude-3-5-haiku ...

Action

"allow"  // permit the request — stop evaluating further rules
"deny"   // block the request with HTTP 403 and a structured error body
"alert"  // record an alert event, continue processing (request is not blocked)
A policy without a catch-all allow rule at the end is deny-by-default — any model not matched by an explicit allow rule will be blocked.

Conditions

Conditions gate a rule on request metadata. The gateway evaluates conditions AFTER matching the target. If conditions are present and don’t match, the rule is skipped and evaluation continues.
{
	"conditions": {
		"user": "alice@corp.com",
		"metadata.role": { "in": ["developer", "admin"] },
		"metadata.team": { "neq": "interns" }
	}
}

Condition operators

OperatorDescription
"value" (direct)Exact equality
{ "eq": "value" }Explicit equality
{ "neq": "value" }Not equal
{ "in": ["a", "b"] }Value is in list
{ "nin": ["a", "b"] }Value is not in list

Condition key paths

PathSource
userX-Igris-User header or body.user
traceIdX-Igris-Trace-Id header
metadata.<key>X-Igris-Metadata-<key> header or JSON blob
virtualKeySlugThe virtual key slug being used

Limit (rate limiting)

The limit field adds a rate limit dimension to the rule. Requests that exceed the limit are blocked with HTTP 429. Limits are tracked per virtual key (or per user if conditions includes a user match).
interface PolicyRuleLimit {
	requests?: number;    // max requests in the window
	tokens?: number;      // max total tokens (input + output) in the window
	dollars?: number;     // max spend in USD in the window
	per: "minute" | "hour" | "day";
}
Example: cap GPT-4o calls for contractors at 10 requests/hour:
{
	"target": { "kind": "llm_model", "model": "gpt-4o" },
	"action": "allow",
	"conditions": { "metadata.role": "contractor" },
	"limit": { "requests": 10, "per": "hour" }
}
Limits use Redis-backed sliding window counters. Multiple dimensions can be set simultaneously — the first exceeded dimension triggers the 429.

Token guards

Token guards operate at request time, before the request is forwarded to the provider. They use a character-count heuristic (chars / 4) for input estimation, and the actual usage field from the response for output recording.
{
	"target": { "kind": "llm_model", "model": "*" },
	"action": "allow",
	"tokenGuard": {
		"maxInputTokens": 8000,
		"maxRequestMaxTokens": 2000
	}
}
  • maxInputTokens — deny if estimated prompt tokens exceed this value
  • maxOutputTokens — silently cap max_tokens in the forwarded request to this value
  • maxRequestMaxTokens — deny if the request’s max_tokens field exceeds this value

Content guards

Content guards inspect the prompt text for PII patterns and keyword blocklists.
{
	"target": { "kind": "llm_endpoint", "endpoint": "chat.completions" },
	"action": "allow",
	"contentGuard": {
		"piiPatterns": [
			"\\b\\d{3}-\\d{2}-\\d{4}\\b",
			"\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\b"
		],
		"keywordBlocklist": ["confidential", "top-secret"],
		"denyOnMatch": true
	}
}
  • piiPatterns — regex patterns applied to the full prompt text (case-insensitive)
  • keywordBlocklist — literal string matching (case-insensitive)
  • denyOnMatch: true — block the request; false — log an alert only

logContent

When logContent: true is set on a matching rule, the full prompt messages and completion text are stored in llm_call_bodies and linked to the audit event. This is disabled by default to avoid storing sensitive data.
{
	"target": { "kind": "llm_model", "model": "*" },
	"action": "allow",
	"logContent": true
}
Enabling logContent stores prompt and completion text verbatim. Ensure this complies with your data retention and privacy policies before enabling. Consider combining with contentGuard to redact PII before logging.

Complete policy example

This example demonstrates a comprehensive policy for a production deployment:
{
	"name": "Production LLM governance policy",
	"virtualKeySlug": "vk_openai_prod",
	"rules": [
		{
			"target": { "kind": "llm_model", "model": "gpt-4o" },
			"action": "deny",
			"conditions": {
				"metadata.tier": { "in": ["free", "trial"] }
			}
		},

		{
			"target": { "kind": "llm_model", "model": "gpt-4*" },
			"action": "allow",
			"conditions": {
				"metadata.tier": "enterprise"
			},
			"limit": {
				"tokens": 1000000,
				"dollars": 50,
				"per": "day"
			},
			"tokenGuard": {
				"maxInputTokens": 32000,
				"maxRequestMaxTokens": 4096
			},
			"logContent": false
		},

		{
			"target": { "kind": "llm_endpoint", "endpoint": "chat.completions" },
			"action": "allow",
			"limit": {
				"requests": 100,
				"per": "minute"
			},
			"contentGuard": {
				"piiPatterns": ["\\b\\d{3}-\\d{2}-\\d{4}\\b"],
				"keywordBlocklist": ["internal-secret", "confidential"],
				"denyOnMatch": false
			}
		},

		{
			"target": { "kind": "llm_model", "model": "*" },
			"action": "alert"
		}
	]
}
How this policy works:
  1. Deny GPT-4o for free/trial tier users
  2. Allow GPT-4 family for enterprise users with daily spend and token limits
  3. Allow chat completions for everyone with a rate limit and content audit (alert-only, not deny)
  4. Alert on any other model (catch-all — no implicit allow, so unknown models are alerted not denied)

Managing policies via SDK

// List all policies
const { data } = await igris.policies.list({ virtualKeySlug: "vk_openai_prod" });

// Create a policy
const policy = await igris.policies.create({
	name: "Block GPT-4 for free tier",
	virtualKeySlug: "vk_openai_prod",
	rules: [
		{
			target: { kind: "llm_model", model: "gpt-4*" },
			action: "deny",
			conditions: { "metadata.tier": "free" },
		},
		{
			target: { kind: "llm_model", model: "*" },
			action: "allow",
		},
	],
});

// Update a policy
await igris.policies.update(policy.id, { enabled: false });

// Delete a policy
await igris.policies.delete(policy.id);