cgn-operatorKubernetes Operator
kube-rs reconciliation controller for declarative inference cluster management via CRDs.
Overview
cgn-operator is a Kubernetes controller built with kube-rs that reconciles three custom resource definitions: InferenceCluster, ModelPool, and RoutingPolicy. It translates declarative manifests into etcd state that the router and agents consume, receives drain hints from the router for energy-aware autoscaling, and implements closed-loop optimization between cluster state and routing decisions.
Features
- Three CRDs: InferenceCluster, ModelPool, RoutingPolicy
- Reconciles CRDs into etcd state consumed by router and agents
- Receives drain hints from router for energy-aware autoscaling
- Closed-loop energy optimization between cluster and routing
- Declarative model rollouts with traffic splitting
- Rolling updates and rollback from a single manifest
- Compatible with GKE, EKS, AKS, k3d, and kind
- Helm chart available for installation
- Built with kube-rs — no Go dependencies
Architecture
Kubernetes API (watch CRDs) → cgn-operator reconcile loop → write routing policies to etcd. Router sends drain hints → operator adjusts replica counts. CRD changes trigger immediate reconciliation.
Configuration
| Key | Type | Default | Description |
|---|---|---|---|
| InferenceCluster | CRD | Defines the cluster topology, etcd endpoints, and security settings | |
| ModelPool | CRD | Declares model deployments with replicas, engine, TP, and resource requests | |
| RoutingPolicy | CRD | Configures routing weights, admission limits, and cascade/disagg policies |
Example
yaml
apiVersion: cognitora.dev/v1
kind: ModelPool
metadata:
name: llama-3-70b
namespace: cognitora
spec:
model: llama-3.1-70b
engine: vllm
tensorParallelism: 4
replicas: 4
resources:
gpu: nvidia-a100
count: 4
---
apiVersion: cognitora.dev/v1
kind: RoutingPolicy
metadata:
name: default
namespace: cognitora
spec:
weights:
kv: 0.55
load: 0.25
power: 0.10
capacity: 0.10