Configuration reference

Cognitora binaries read a single TOML file (default /etc/cognitora/cognitora.toml) plus environment overrides. The authoritative schema lives in rust/libraries/cgn-core/src/config.rs; the canonical example with every section documented inline lives at configs/cognitora.toml.example.

Sections

Section	Owner crate	Required by
`[cluster]`	`cgn-core::config`	every binary
`[security]`	`cgn-tls`	every binary that opens mTLS
`[auth]`	`cgn-auth`	`cgn-router`
`[router.*]`	`cgn-router`	`cgn-router`
`[agent.*]`	`cgn-agent`	`cgn-agent`
`[engine.*]`	`cgn-agent`	`cgn-agent` (which engine to spawn / proxy)
`[kv.*]`	`cgn-kv`	`cgn-kvcached`
`[metrics.*]`	`cgn-metrics`	`cgn-metrics`
`[models.<name>]`	`cgn-core::config`	`cgn-router` (declarative model registry)

`[engine]` — pluggable inference engine

Cognitora's cgn-agent is engine-agnostic: any process that exposes the OpenAI HTTP surface (/v1/completions, /health, /v1/models) plugs in.

Key	Type	Default	Notes
`engine.kind`	enum	`"vllm"`	One of `vllm`, `sglang`, `llama_cpp`, `openai_compat`.
`engine.url`	string	`http://127.0.0.1:8000`	OpenAI HTTP base URL.
`engine.kv_offload`	enum	`"none"`	Engine-side KV offload backend. One of `none`, `nixl`, `lmcache`, `hicache`, `kvbm`. See Engine-side KV offload below.
`engine.vllm.binary`	string	`"vllm"`	Path or PATH-name of the `vllm` CLI.
`engine.vllm.extra_args`	array	`["--enable-chunked-prefill"]`	Appended after the auto-rendered argv.
`engine.sglang.binary`	string	`"python"`	Python interpreter that runs `-m sglang.launch_server`.
`engine.sglang.host`	string	`"127.0.0.1"`	Where the engine listens.
`engine.sglang.port`	u16	`8000`	Must match `engine.url`.
`engine.sglang.context_length`	u32	`4096`	Default context window when `[models.\*].max_model_len` is unset.
`engine.sglang.mem_fraction_static`	f32	`0.85`	Mem fraction for SGLang's RadixAttention KV pool.
`engine.sglang.extra_args`	array	`[]`	Appended after the auto-rendered argv. Pass `--enable-radix-cache` here.
`engine.llama_cpp.binary`	string	`"python"`	Python interpreter (`mode = python_server`) or `llama-server` binary (`mode = binary`).
`engine.llama_cpp.mode`	enum	`"python_server"`	`python_server` or `binary`.
`engine.llama_cpp.host`	string	`"127.0.0.1"`	Where the engine listens.
`engine.llama_cpp.port`	u16	`8000`	Must match `engine.url`.
`engine.llama_cpp.n_ctx`	u32	`4096`	Context window.
`engine.llama_cpp.n_threads`	u32	`4`	CPU thread count.
`engine.llama_cpp.n_gpu_layers`	i32	`0`	`0` = CPU only, `-1` = all to GPU.
`engine.llama_cpp.extra_args`	array	`[]`	Extra flags passed to the engine.

When kind = "openai_compat" the agent does not spawn a child process; it only proxies to whatever is at engine.url. Use this with systemd / Kubernetes / a sidecar that owns the engine lifecycle.

Engine selection

The four supported engines map to the same OpenAI HTTP surface, so they are fully interchangeable from the router's perspective:

vllm — vllm serve <model> --tensor-parallel-size <N> .... Best general-purpose GPU engine; supports continuous batching and chunked prefill out of the box.
sglang — python -m sglang.launch_server --model-path <model> --tp <N> .... Adds RadixAttention prefix caching that complements Cognitora's cross-node prefix routing — the router still picks the node with the longest cached prefix, and SGLang then reuses cache inside that node.
llama_cpp — CPU-friendly fallback (and CUDA-offload via n_gpu_layers); useful for laptops, CI, and edge deployments.
openai_compat — proxy-only.

Engine-side KV offload

engine.kv_offload selects which connector cgn-agent injects when spawning the engine. The router is unaware of this dial — it only sees prefix-overlap signals via cgn-kvcached either way — so swapping backends is a one-line change.

Value	Effect (vLLM)	Effect (SGLang)
`none`	nothing injected	nothing injected
`nixl`	`--kv-transfer-config '{"kv_connector":"NixlConnector",...}'` with role-aware `kv_role`	(rejected — SGLang HiCache uses NIXL internally; pick `hicache` instead)
`lmcache`	`LMCacheConnectorV1` (agg) or `PdConnector(LMCache+NIXL)` (disagg, prefill role)	(rejected — LMCache is vLLM-side)
`hicache`	(rejected — vLLM has no HiCache)	`--enable-hierarchical-cache --hicache-ratio 2 --hicache-write-policy write_through --hicache-storage-backend nixl`
`kvbm`	`--kv-transfer-config '{"kv_connector":"DynamoConnector","kv_connector_module_path":"kvbm.vllm_integration.connector",...}'`	(rejected — KVBM has no SGLang support)

Disagg topologies ([agent].role = "prefill" or "decode") compose the chosen backend with NIXL automatically. The full table — including the exact JSON blobs — lives in docs/architecture/kv-strategy.md.

LMCache, HiCache, and KVBM all require the corresponding Python package to be installed in the engine's virtualenv. cgn-agent does not install them; the recipe's up.sh warns when they're missing.

Per-model knobs

[models."<name>"].path is required when engine.kind = "llama_cpp" (the filesystem path to a .gguf file). For SGLang, path is optional: when unset SGLang resolves the model name as a HuggingFace repo id; when set, SGLang loads from the local directory. vLLM behaves the same way.

Legacy aliases

[agent].vllm_url and [agent].vllm_cmd from older configs still work but emit a one-time warning. Migrate them to [engine].url and [engine.vllm].extra_args respectively.

Overrides

Every TOML key has a corresponding environment variable: prepend COGNITORA__, separate sections with __, and use SCREAMING_SNAKE.

# Override [router].listen_http
COGNITORA__ROUTER__LISTEN_HTTP=0.0.0.0:8000

# Disable auth for a dev run
COGNITORA__AUTH__ENABLED=false

CLI flags take precedence over the env, which takes precedence over the TOML file, which takes precedence over compiled defaults.

Hot reload

The following keys reload without restart:

[auth].api_keys_file (sha256 keys file is watched and re-read)
[router.score_weights] (router subscribes to etcd /cognitora/routing/policy)
[router.cascade] and [router.disagg] (same etcd key)

Everything else requires systemctl restart cgn-<binary> or, in K8s, a rolling restart of the corresponding deployment / DaemonSet.

Configuration