cgn-kvcachedMulti-Tier KV Cache Daemon
GPU/RAM/SSD tiered KV cache with cross-node QUIC/RDMA fetch. The shared memory layer that makes KV-aware routing possible.
Overview
cgn-kvcached manages a three-tier KV block store: GPU HBM (engine-internal), RAM (warm, DashMap with approximate-LRU), and SSD (cold, RocksDB-indexed with io_uring). When a block is not found locally, it fetches from peer nodes via QUIC (1-RTT, 0-RTT on repeat connections) or optional RDMA (GPUDirect). Block addressing uses a 36-byte key: BLAKE3-256 of (model, dtype, prefix tokens) plus layer index.
Features
- Three-tier storage: GPU HBM → RAM (DashMap, approximate-LRU) → SSD (RocksDB, io_uring)
- 36-byte block addressing: BLAKE3-256(model, dtype, prefix) + layer index
- Cross-node QUIC fetch: 1-RTT connection, 0-RTT on repeat peers, multi-stream multiplexing
- Optional RDMA support via GPUDirect with identical frame codec
- Binary frame format: Frame { addr, model, layer, bytes }
- RocksDB column-family (cf=kv) for persistent block metadata index
- In-memory DashMap fallback for development builds
- Non-blocking eviction: never blocks writes to make room
- Background LRU walks at 1 Hz
- SSD files stored as <short(digest)>-<layer>.kvb with O_DIRECT
- Survives restarts via RocksDB index reconciliation
- Mirror-pulls configurable for cross-node replication
Architecture
Engine evicts GPU block → cgn-kvcached receives via UDS → stores in RAM tier (DashMap) → background eviction to SSD tier (RocksDB + io_uring). On cache miss: local RAM → local SSD → QUIC peer fetch → engine re-prefill. Metadata index in RocksDB is authoritative for warm/cold tiers. Exposes cognitora.v1.Kv gRPC service for router queries.
Configuration
| Key | Type | Default | Description |
|---|---|---|---|
| kv.listen | string | 0.0.0.0:7090 | gRPC listen address |
| kv.uds | string | "" | Unix domain socket path for agent communication |
| kv.ram_gib | u32 | 4 | RAM tier capacity in GiB |
| kv.ssd_dir | string | "/var/lib/cognitora/kv/ssd/" | SSD storage directory |
| kv.ssd_gib | u32 | 64 | SSD tier capacity in GiB |
| kv.ssd_ttl | duration | "24h" | Time-to-live for SSD-cached blocks |
| kv.index_dir | string | "/var/lib/cognitora/kv/index/" | RocksDB index directory |
| kv.transport | enum | "quic" | Cross-node transport: quic or rdma |
| kv.quic_listen | string | 0.0.0.0:7091 | QUIC listen address for peer fetch |
| kv.block_size_tokens | u32 | 16 | Tokens per KV block |
| kv.mirror_pulls | bool | false | Mirror pulled blocks to local store |
Example
[cluster]
name = "production"
state_backend = "etcd"
etcd_endpoints = ["http://etcd-0:2379"]
[security]
require_mtls = true
[kv]
listen = "0.0.0.0:7090"
uds = "/tmp/cognitora-kv.sock"
ram_gib = 8
ssd_dir = "/var/lib/cognitora/kv/ssd"
ssd_gib = 128
index_dir = "/var/lib/cognitora/kv/index"
transport = "quic"
quic_listen = "0.0.0.0:7091"
block_size_tokens = 16
mirror_pulls = truePerformance
GPU tier lookup: < 30µs
RAM tier (warm) hit: < 200µs
SSD tier (cold) hit: < 5ms
Cross-node QUIC fetch (1 MiB, 10 GbE): < 12ms
Cache hit ratio target: ≥ 55% on representative traces
Background eviction: 1 Hz LRU walk, never blocks writes