Skip to main content

LLM Gateway Concepts

This page explains the core abstractions in the Igris LLM Gateway. Read this before diving into the SDK Reference or Policies docs.

Virtual Keys

A virtual key is an encrypted credential vault that maps an Igris slug to an upstream provider API key. Your application code never holds the real provider key — it only knows the slug (e.g. vk_openai_prod). The gateway resolves the slug at request time, injects the real credential, and forwards to the upstream provider. Virtual keys can be scoped to a specific organization, restricted to a subset of models, enabled or disabled without a code change, and rotated by updating the vault entry in the dashboard. A single slug can be reused across as many callers as you like — policy rules target the slug. See Virtual Keys for CRUD operations via the SDK or REST API.

Providers

A provider is a registered upstream LLM service — OpenAI, Anthropic, Groq, Mistral, and 56 more. Each provider registration captures:
  • Slug — the identifier used in virtual key creation and model routing (e.g. openai, anthropic)
  • Base URL — where the gateway forwards requests
  • Auth stylebearer, x-api-key, or query-param
  • Supported endpoints — which of chat.completions, embeddings, images.generate, audio.transcriptions, audio.speech, or passthrough this provider supports
Providers with baseUrl: null (Ollama, HuggingFace, Triton, Modal) require a customBaseUrl to be set on the virtual key for self-hosted deployments. See the full Provider Catalog.

Policies

A policy is an ordered list of PolicyRule objects attached to an organization or virtual key. Rules are evaluated in order — first match wins. Each rule has four main parts:

Target

What the rule matches. Three discriminated union variants:
// Match a specific LLM model or a glob
{ kind: "llm_model"; model: "gpt-4o" }
{ kind: "llm_model"; model: "gpt-4*" }

// Match a specific endpoint
{ kind: "llm_endpoint"; endpoint: "chat.completions" }

// Match an MCP tool (for MCP governance rules in the same policy)
{ kind: "mcp_tool"; tool: "delete_*" }

Action

What happens when the rule matches: "allow", "deny", or "alert". Deny returns HTTP 403. Alert records an alert event without blocking the request.

Conditions

Optional metadata conditions that must also match for the rule to fire. Keys are dotted paths into the request context (e.g. metadata.role, user). Operators: eq, neq, in, nin.
{ "conditions": { "metadata.role": { "in": ["intern", "contractor"] } } }

Limits, Guards, and Content Controls

Rules can also carry:
  • limit — rate limit on requests, tokens, or dollars per minute/hour/day
  • tokenGuard — cap max_tokens or reject requests exceeding an input token estimate
  • contentGuard — PII pattern matching, keyword blocklist
  • logContent — whether to persist the full prompt and completion to llm_call_bodies
See Policies for the full shape with examples.

Audit Trail

Every request through the gateway produces an audit event with:
  • type: "llm_call"
  • Provider + model (resolved after gateway routing)
  • inputTokens, outputTokens, cachedTokens
  • costCents (integer, USD cents × 100 — sub-cent precision)
  • Latency in milliseconds
  • userId, traceId, virtualKeySlug
  • Policy action taken
  • requestId for correlation
Query audit events from the SDK with igris.auditEvents.list() or browse them in the dashboard under LLM → Audit Trail. If logContent: true is set on a matching rule, the full prompt and completion are stored in llm_call_bodies and linked via the audit event ID.

Cost Tracking

Cost is computed server-side using a live pricing snapshot (vendored from the Portkey provider registry and periodically refreshed). The costCents column stores the value as an integer representing USD cents × 100 — so 150 means $0.0150. The LLM → Cost dashboard aggregates spend by day, provider, model, virtual key, and user. Anomaly detection also monitors cost-per-minute as one of its five signal dimensions.

Anomaly Detection

The LlmAnomalyDetector runs five parallel signal trackers on every request:
SignalTrigger
Cost spikeCost/minute exceeds baseline EWMA by configurable factor
Token burnTokens/minute exceeds threshold
Response lengthOutput tokens for a single response exceeds threshold
Model shiftObserved model diverges from expected model baseline
Error rateHTTP 4xx/5xx rate exceeds threshold
When a signal fires, an alert event is written and optionally delivered via webhook. Alerts do not block requests by default — configure a deny action on the policy rule to make them blocking.

Three Usage Styles

You can route traffic through the gateway in three ways: Use igris.chat.completions.create() directly. The @slug/model model prefix tells the SDK which virtual key to route through. Zero external dependencies beyond the Igris SDK.
const igris = new Igris({ apiKey: "igris_sk_..." });
await igris.chat.completions.create({ model: "@vk_openai_prod/gpt-4o", messages: [...] });

2. connectLlm escape hatch

Use igris.connectLlm(slug, options) to get a { baseUrl, apiKey, headers } object and wire it into any OpenAI-compatible SDK client. This is the zero-migration path when your app already uses the OpenAI Node SDK.
import OpenAI from "openai";
import { withIgris } from "@igris-security/sdk/adapters/openai";

const igris = new Igris({ apiKey: "igris_sk_..." });
const openai = withIgris(new OpenAI({ apiKey: "placeholder" }), igris, "vk_openai_prod");
// Now openai.chat.completions.create() routes through Igris

3. Raw HTTP

Any HTTP client that can set Authorization: Bearer <igris_key> and target https://api.igrisecurity.com/llm/<slug>/v1/chat/completions works directly. This is useful for language runtimes without an Igris SDK (Python, Go, Rust, etc.).
curl https://api.igrisecurity.com/llm/vk_openai_prod/v1/chat/completions \
  -H "Authorization: Bearer igris_sk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Request Modes: Passthrough vs Transformed

Most providers use transformed mode — the gateway rewrites the request body and response to match the OpenAI chat completions schema, regardless of what the upstream expects. This means you can switch providers without changing your application code. Passthrough mode (endpoint: "passthrough") forwards the raw request body directly to the provider without transformation. Use this for providers with non-standard APIs (image generation, 3D model generation, audio) where the OpenAI schema doesn’t apply.

Metadata Channels

Request context (user identity, trace IDs, metadata for policy conditions) can be provided through three channels, resolved in priority order:
  1. Individual headersX-Igris-Metadata-<key>: <value> (highest priority). One header per metadata field. Example: X-Igris-Metadata-role: developer.
  2. JSON blob headerx-igris-metadata: {"user":"alice","role":"developer"} plus special sentinels _user and _trace_id for the core identity fields.
  3. Request bodybody.user (OpenAI convention). Lowest priority, for compatibility with clients that already set the OpenAI user field.
The connectLlm() method handles encoding all three channels automatically when you pass user, traceId, and metadata options.