LLM Gateway Concepts

This page explains the core abstractions in the Igris LLM Gateway. Read this before diving into the SDK Reference or Policies docs.

Connections

A connection is an encrypted credential vault that maps an Igris slug to an upstream provider API key. Your application code never holds the real provider key — it only knows the slug (e.g. openai-prod). The gateway resolves the slug at request time, injects the real credential, and forwards to the upstream provider. Connections can be scoped to a specific organization, restricted to a subset of models, enabled or disabled without a code change, and rotated by updating the vault entry in the dashboard. A single slug can be reused across as many callers as you like — policy rules target the slug. See Connections for CRUD operations via the SDK or REST API.

Providers

A provider is a registered upstream LLM service — OpenAI, Anthropic, Groq, Mistral, and 56 more. Each provider registration captures:

Slug — the identifier used in connection creation and model routing (e.g. openai, anthropic)
Base URL — where the gateway forwards requests
Auth style — bearer, x-api-key, or query-param
Supported endpoints — which of chat.completions, embeddings, images.generate, audio.transcriptions, audio.speech, or passthrough this provider supports

Providers with baseUrl: null (Ollama, HuggingFace, Triton, Modal) require a customBaseUrl to be set on the connection for self-hosted deployments. See the full Provider Catalog.

Policies

A policy is an ordered list of PolicyRule objects attached to an organization or connection. Rules are evaluated in order — first match wins. Each rule has four main parts:

Target

What the rule matches. Three discriminated union variants:

// Match a specific LLM model or a glob
{ kind: "llm_model"; model: "gpt-4o" }
{ kind: "llm_model"; model: "gpt-4*" }

// Match a specific endpoint
{ kind: "llm_endpoint"; endpoint: "chat.completions" }

// Match an MCP tool (for MCP governance rules in the same policy)
{ kind: "mcp_tool"; tool: "delete_*" }

Action

What happens when the rule matches: "allow", "deny", or "alert". Deny returns HTTP 403. Alert records an alert event without blocking the request.

Conditions

Optional metadata conditions that must also match for the rule to fire. Keys are dotted paths into the request context (e.g. metadata.role, user). Operators: eq, neq, in, nin.

{ "conditions": { "metadata.role": { "in": ["intern", "contractor"] } } }

Limits, Guards, and Content Controls

Rules can also carry:

limit — rate limit on requests, tokens, or dollars per minute/hour/day
tokenGuard — cap max_tokens or reject requests exceeding an input token estimate
contentGuard — runs detectors (PII, secrets, prompt injection, custom regex/keywords) over the prompt and optionally the response, with action: "deny" | "redact" | "alert"
logContent — whether to persist the full prompt and completion to llm_call_bodies

See Policies for the full shape with examples.

Audit Trail

Every request through the gateway produces an audit event with:

type: "llm_call"
Provider + model (resolved after gateway routing)
inputTokens, outputTokens, cachedTokens
costCents (integer, USD cents × 100 — sub-cent precision)
Latency in milliseconds
userId, traceId, connectionSlug
Policy action taken
requestId for correlation

Query audit events from the SDK with igris.auditEvents.list() or browse them in the dashboard under LLM → Audit Trail. If logContent: true is set on a matching rule, the full prompt and completion are stored in llm_call_bodies and linked via the audit event ID.

Cost Tracking

Cost is computed server-side using a live pricing snapshot (vendored from the Portkey provider registry and periodically refreshed). The costCents column stores the value as an integer representing USD cents × 100 — so 150 means $0.0150. The LLM → Cost dashboard aggregates spend by day, provider, model, connection, and user. Anomaly detection also monitors cost-per-minute as one of its five signal dimensions.

Anomaly Detection

The LlmAnomalyDetector runs five parallel signal trackers on every request:

Signal	Trigger
Cost spike	Cost/minute exceeds baseline EWMA by configurable factor
Token burn	Tokens/minute exceeds threshold
Response length	Output tokens for a single response exceeds threshold
Model shift	Observed model diverges from expected model baseline
Error rate	HTTP 4xx/5xx rate exceeds threshold

When a signal fires, an alert event is written and optionally delivered via webhook. Alerts do not block requests by default — configure a deny action on the policy rule to make them blocking.

Three Usage Styles

You can route traffic through the gateway in three ways:

1. SDK native (recommended)

Use igris.chat.completions.create() directly. The @slug/model model prefix tells the SDK which connection to route through. Zero external dependencies beyond the Igris SDK.

const igris = new Igris({ apiKey: "ig_..." });
await igris.chat.completions.create({ model: "@openai-prod/gpt-4o", messages: [...] });

2. connectLlm escape hatch

Use igris.connectLlm(slug, options) to get a { baseUrl, apiKey, headers } object and wire it into any OpenAI-compatible SDK client. This is the zero-migration path when your app already uses the OpenAI Node SDK.

import OpenAI from "openai";
import { withIgris } from "@igris-security/sdk/adapters/openai";

const igris = new Igris({ apiKey: "ig_..." });
const openai = withIgris(new OpenAI({ apiKey: "placeholder" }), igris, "openai-prod");
// Now openai.chat.completions.create() routes through Igris

3. Raw HTTP

Any HTTP client that can set Authorization: Bearer <igris_key> and target https://api.igrisecurity.com/llm/<slug>/v1/chat/completions works directly. This is useful for language runtimes without an Igris SDK (Python, Go, Rust, etc.).

curl https://api.igrisecurity.com/llm/openai-prod/v1/chat/completions \
  -H "Authorization: Bearer ig_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Request Modes: Passthrough vs Transformed

Most providers use transformed mode — the gateway rewrites the request body and response to match the OpenAI chat completions schema, regardless of what the upstream expects. This means you can switch providers without changing your application code. Passthrough mode (endpoint: "passthrough") forwards the raw request body directly to the provider without transformation. Use this for providers with non-standard APIs (image generation, 3D model generation, audio) where the OpenAI schema doesn’t apply.

Metadata Channels

Request context (user identity, trace IDs, metadata for policy conditions) can be provided through three channels, resolved in priority order:

Individual headers — X-Igris-Metadata-<key>: <value> (highest priority). One header per metadata field. Example: X-Igris-Metadata-role: developer.
JSON blob header — x-igris-metadata: {"user":"alice","role":"developer"} plus special sentinels _user and _trace_id for the core identity fields.
Request body — body.user (OpenAI convention). Lowest priority, for compatibility with clients that already set the OpenAI user field.

The connectLlm() method handles encoding all three channels automatically when you pass user, traceId, and metadata options.

Documentation Index

​LLM Gateway Concepts

​Connections

​Providers

​Policies

​Target

​Action

​Conditions

​Limits, Guards, and Content Controls

​Audit Trail

​Cost Tracking

​Anomaly Detection

​Three Usage Styles

​1. SDK native (recommended)

​2. connectLlm escape hatch

​3. Raw HTTP

​Request Modes: Passthrough vs Transformed

​Metadata Channels