LLM Gateway Concepts
This page explains the core abstractions in the Igris LLM Gateway. Read this before diving into the SDK Reference or Policies docs.Virtual Keys
A virtual key is an encrypted credential vault that maps an Igris slug to an upstream provider API key. Your application code never holds the real provider key — it only knows the slug (e.g.vk_openai_prod). The gateway resolves the slug at request time, injects the real credential,
and forwards to the upstream provider.
Virtual keys can be scoped to a specific organization, restricted to a subset of models, enabled or
disabled without a code change, and rotated by updating the vault entry in the dashboard. A single
slug can be reused across as many callers as you like — policy rules target the slug.
See Virtual Keys for CRUD operations via the SDK or REST API.
Providers
A provider is a registered upstream LLM service — OpenAI, Anthropic, Groq, Mistral, and 56 more. Each provider registration captures:- Slug — the identifier used in virtual key creation and model routing (e.g.
openai,anthropic) - Base URL — where the gateway forwards requests
- Auth style —
bearer,x-api-key, orquery-param - Supported endpoints — which of
chat.completions,embeddings,images.generate,audio.transcriptions,audio.speech, orpassthroughthis provider supports
baseUrl: null (Ollama, HuggingFace, Triton, Modal) require a customBaseUrl to
be set on the virtual key for self-hosted deployments.
See the full Provider Catalog.
Policies
A policy is an ordered list ofPolicyRule objects attached to an organization or virtual key.
Rules are evaluated in order — first match wins.
Each rule has four main parts:
Target
What the rule matches. Three discriminated union variants:Action
What happens when the rule matches:"allow", "deny", or "alert". Deny returns HTTP 403.
Alert records an alert event without blocking the request.
Conditions
Optional metadata conditions that must also match for the rule to fire. Keys are dotted paths into the request context (e.g.metadata.role, user). Operators: eq, neq, in, nin.
Limits, Guards, and Content Controls
Rules can also carry:limit— rate limit on requests, tokens, or dollars per minute/hour/daytokenGuard— capmax_tokensor reject requests exceeding an input token estimatecontentGuard— PII pattern matching, keyword blocklistlogContent— whether to persist the full prompt and completion tollm_call_bodies
Audit Trail
Every request through the gateway produces an audit event with:type: "llm_call"- Provider + model (resolved after gateway routing)
inputTokens,outputTokens,cachedTokenscostCents(integer, USD cents × 100 — sub-cent precision)- Latency in milliseconds
userId,traceId,virtualKeySlug- Policy action taken
requestIdfor correlation
igris.auditEvents.list() or browse them in the dashboard
under LLM → Audit Trail.
If logContent: true is set on a matching rule, the full prompt and completion are stored in
llm_call_bodies and linked via the audit event ID.
Cost Tracking
Cost is computed server-side using a live pricing snapshot (vendored from the Portkey provider registry and periodically refreshed). ThecostCents column stores the value as an integer
representing USD cents × 100 — so 150 means $0.0150.
The LLM → Cost dashboard aggregates spend by day, provider, model, virtual key, and user.
Anomaly detection also monitors cost-per-minute as one of its five signal dimensions.
Anomaly Detection
TheLlmAnomalyDetector runs five parallel signal trackers on every request:
| Signal | Trigger |
|---|---|
| Cost spike | Cost/minute exceeds baseline EWMA by configurable factor |
| Token burn | Tokens/minute exceeds threshold |
| Response length | Output tokens for a single response exceeds threshold |
| Model shift | Observed model diverges from expected model baseline |
| Error rate | HTTP 4xx/5xx rate exceeds threshold |
deny action on the policy rule to make them blocking.
Three Usage Styles
You can route traffic through the gateway in three ways:1. SDK native (recommended)
Useigris.chat.completions.create() directly. The @slug/model model prefix tells the SDK which
virtual key to route through. Zero external dependencies beyond the Igris SDK.
2. connectLlm escape hatch
Useigris.connectLlm(slug, options) to get a { baseUrl, apiKey, headers } object and wire it
into any OpenAI-compatible SDK client. This is the zero-migration path when your app already uses
the OpenAI Node SDK.
3. Raw HTTP
Any HTTP client that can setAuthorization: Bearer <igris_key> and target
https://api.igrisecurity.com/llm/<slug>/v1/chat/completions works directly. This is useful for
language runtimes without an Igris SDK (Python, Go, Rust, etc.).
Request Modes: Passthrough vs Transformed
Most providers use transformed mode — the gateway rewrites the request body and response to match the OpenAI chat completions schema, regardless of what the upstream expects. This means you can switch providers without changing your application code. Passthrough mode (endpoint: "passthrough") forwards the raw request body directly to the
provider without transformation. Use this for providers with non-standard APIs (image generation,
3D model generation, audio) where the OpenAI schema doesn’t apply.
Metadata Channels
Request context (user identity, trace IDs, metadata for policy conditions) can be provided through three channels, resolved in priority order:-
Individual headers —
X-Igris-Metadata-<key>: <value>(highest priority). One header per metadata field. Example:X-Igris-Metadata-role: developer. -
JSON blob header —
x-igris-metadata: {"user":"alice","role":"developer"}plus special sentinels_userand_trace_idfor the core identity fields. -
Request body —
body.user(OpenAI convention). Lowest priority, for compatibility with clients that already set the OpenAIuserfield.
connectLlm() method handles encoding all three channels automatically when you pass user,
traceId, and metadata options.