All components
cgn-kvcached

Multi-Tier KV Cache Daemon

GPU/RAM/SSD tiered KV cache with cross-node QUIC/RDMA fetch. The shared memory layer that makes KV-aware routing possible.

Overview

cgn-kvcached manages a three-tier KV block store: GPU HBM (engine-internal), RAM (warm, DashMap with approximate-LRU), and SSD (cold, RocksDB-indexed with io_uring). When a block is not found locally, it fetches from peer nodes via QUIC (1-RTT, 0-RTT on repeat connections) or optional RDMA (GPUDirect). Block addressing uses a 36-byte key: BLAKE3-256 of (model, dtype, prefix tokens) plus layer index.

Features

  • Three-tier storage: GPU HBM → RAM (DashMap, approximate-LRU) → SSD (RocksDB, io_uring)
  • 36-byte block addressing: BLAKE3-256(model, dtype, prefix) + layer index
  • Cross-node QUIC fetch: 1-RTT connection, 0-RTT on repeat peers, multi-stream multiplexing
  • Optional RDMA support via GPUDirect with identical frame codec
  • Binary frame format: Frame { addr, model, layer, bytes }
  • RocksDB column-family (cf=kv) for persistent block metadata index
  • In-memory DashMap fallback for development builds
  • Non-blocking eviction: never blocks writes to make room
  • Background LRU walks at 1 Hz
  • SSD files stored as <short(digest)>-<layer>.kvb with O_DIRECT
  • Survives restarts via RocksDB index reconciliation
  • Mirror-pulls configurable for cross-node replication

Architecture

Engine evicts GPU block → cgn-kvcached receives via UDS → stores in RAM tier (DashMap) → background eviction to SSD tier (RocksDB + io_uring). On cache miss: local RAM → local SSD → QUIC peer fetch → engine re-prefill. Metadata index in RocksDB is authoritative for warm/cold tiers. Exposes cognitora.v1.Kv gRPC service for router queries.

Configuration

KeyTypeDefaultDescription
kv.listenstring0.0.0.0:7090gRPC listen address
kv.udsstring""Unix domain socket path for agent communication
kv.ram_gibu324RAM tier capacity in GiB
kv.ssd_dirstring"/var/lib/cognitora/kv/ssd/"SSD storage directory
kv.ssd_gibu3264SSD tier capacity in GiB
kv.ssd_ttlduration"24h"Time-to-live for SSD-cached blocks
kv.index_dirstring"/var/lib/cognitora/kv/index/"RocksDB index directory
kv.transportenum"quic"Cross-node transport: quic or rdma
kv.quic_listenstring0.0.0.0:7091QUIC listen address for peer fetch
kv.block_size_tokensu3216Tokens per KV block
kv.mirror_pullsboolfalseMirror pulled blocks to local store

Example

toml
[cluster]
name           = "production"
state_backend  = "etcd"
etcd_endpoints = ["http://etcd-0:2379"]

[security]
require_mtls = true

[kv]
listen            = "0.0.0.0:7090"
uds               = "/tmp/cognitora-kv.sock"
ram_gib           = 8
ssd_dir           = "/var/lib/cognitora/kv/ssd"
ssd_gib           = 128
index_dir         = "/var/lib/cognitora/kv/index"
transport         = "quic"
quic_listen       = "0.0.0.0:7091"
block_size_tokens = 16
mirror_pulls      = true

Performance

GPU tier lookup: < 30µs

RAM tier (warm) hit: < 200µs

SSD tier (cold) hit: < 5ms

Cross-node QUIC fetch (1 MiB, 10 GbE): < 12ms

Cache hit ratio target: ≥ 55% on representative traces

Background eviction: 1 Hz LRU walk, never blocks writes