Security model
Cognitora is mTLS-by-default and hard to mis-configure into "open internet" mode. This page describes the trust boundaries, key material, and the auth flow for the OpenAI HTTP surface.
Trust boundaries
┌── Client (OpenAI SDK) ──┐
│ Authorization: Bearer │ HTTPS
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ cgn-router :8080 │ ← public ingress, OIDC + API key
│ cgn-router :7070 mTLS │ ← internal mesh, peers below
└────┬────────────────┬───┘
│ mTLS gRPC │ mTLS gRPC
▼ ▼
┌──────────┐ ┌──────────────┐
│ cgn-agent│ │ cgn-kvcached │
└──────────┘ └──────────────┘
- Public surface (HTTP): any OpenAI client. Authenticated via API
key (
Authorization: Bearer <key>) or OIDC bearer token. - Internal surface (gRPC + QUIC): only Cognitora processes, authenticated by mTLS leaf certificates rooted at a cluster CA.
- Admin surface (HTTP
:9091/:9092): bound to127.0.0.1by default. The Helm chart never exposes these as Services.
PKI
Two paths:
cgn-ctl pki bootstrap— generates a dev CA + leaf cert with rcgen and writes them under/etc/cognitora/pki/. Suitable for dev / single-node, not for production.- External CA — set
[security] ca_file = ...and provide leaf certs via your usual issuer (cert-manager, HashiCorp Vault, ACM PCA).
The internal CA must include all hostnames that other Cognitora nodes will dial; in K8s the operator generates SANs for each Service.
API auth
cgn-auth::middleware::auth_middleware runs ahead of every /v1/*
route. Authentication path:
- If the request carries
Authorization: Bearer <token>:- If
<token>matches the sha256 of an entry in[auth].api_keys_file, the request proceeds withsubject = "key:<id>". - Otherwise the token is treated as a JWT and validated against the
configured OIDC issuer's JWKS. The
subclaim becomessubject = "oidc:<sub>".
- If
- The middleware sets the
x-cgn-subjectheader on the inner request socgn-ratelimitand the gateway handlers can read it without re-doing the lookup. - If no auth material is present and
[auth].enabled = true, the request is rejected with401. If[auth].enabled = false, the middleware no-ops (dev / smoke tests only).
Distroless
All production images are built FROM gcr.io/distroless/cc-debian12:nonroot.
There is no shell, no package manager, and the process runs as UID 65532.
The agent image is the only exception: it sits on top of the official
vLLM image because the engine wants Python + CUDA at runtime.
Auditing
Every authenticated request emits:
- a
tracingspan withsubject,model, andrequest_id, - a Prometheus counter
cgn_router_requests_total{auth, model}, - an OTLP span when
OTEL_EXPORTER_OTLP_ENDPOINTis set.