All components
cgn-operator

Kubernetes Operator

kube-rs reconciliation controller for declarative inference cluster management via CRDs.

Overview

cgn-operator is a Kubernetes controller built with kube-rs that reconciles three custom resource definitions: InferenceCluster, ModelPool, and RoutingPolicy. It translates declarative manifests into etcd state that the router and agents consume, receives drain hints from the router for energy-aware autoscaling, and implements closed-loop optimization between cluster state and routing decisions.

Features

  • Three CRDs: InferenceCluster, ModelPool, RoutingPolicy
  • Reconciles CRDs into etcd state consumed by router and agents
  • Receives drain hints from router for energy-aware autoscaling
  • Closed-loop energy optimization between cluster and routing
  • Declarative model rollouts with traffic splitting
  • Rolling updates and rollback from a single manifest
  • Compatible with GKE, EKS, AKS, k3d, and kind
  • Helm chart available for installation
  • Built with kube-rs — no Go dependencies

Architecture

Kubernetes API (watch CRDs) → cgn-operator reconcile loop → write routing policies to etcd. Router sends drain hints → operator adjusts replica counts. CRD changes trigger immediate reconciliation.

Configuration

KeyTypeDefaultDescription
InferenceClusterCRDDefines the cluster topology, etcd endpoints, and security settings
ModelPoolCRDDeclares model deployments with replicas, engine, TP, and resource requests
RoutingPolicyCRDConfigures routing weights, admission limits, and cascade/disagg policies

Example

yaml
apiVersion: cognitora.dev/v1
kind: ModelPool
metadata:
  name: llama-3-70b
  namespace: cognitora
spec:
  model: llama-3.1-70b
  engine: vllm
  tensorParallelism: 4
  replicas: 4
  resources:
    gpu: nvidia-a100
    count: 4
---
apiVersion: cognitora.dev/v1
kind: RoutingPolicy
metadata:
  name: default
  namespace: cognitora
spec:
  weights:
    kv: 0.55
    load: 0.25
    power: 0.10
    capacity: 0.10