Repository layout
This document defines the canonical folder structure for the Cognitora monorepo, the rules that govern where new code goes, and the conventions every service and library follows.
Top-level rules
- One language: Rust. All services and libraries live under rust/. Cognitora is a single Cargo workspace rooted at
Cargo.toml. - Services vs libraries. Binary crates go under
rust/services/; library crates go underrust/libraries/. Anything that produces a long-running process is a service. - Protos are source of truth. Wire formats live in proto/cognitora/v1/ and are compiled into Rust code by
cgn-proto/build.rs(tonic-build) at compile time. - Deployment artefacts never mix with source. Helm charts, systemd units, Terraform modules, Dockerfiles, and the installer all live under deploy/. Source code never imports from
deploy/. - Docs are first-class. Every architectural decision lands in docs/architecture/; every operational procedure lands in docs/operations/; every API surface lands in docs/api/.
- Gateway and router are one binary. OpenAI-compatible HTTP/SSE serving is implemented as a module inside
cgn-router. There is no separatecgn-gatewaydaemon — eliminating an extra hop, an extra TLS context, and an extra failure mode.
Tree
cognitora/
├── Cargo.toml Rust workspace root
├── Makefile build entrypoints
├── buf.yaml proto lint
├── rust-toolchain.toml pinned Rust toolchain (1.89+)
├── README.md, LICENSE
│
├── proto/cognitora/v1/ gRPC source of truth
│ ├── common.proto
│ ├── router.proto
│ ├── agent.proto
│ ├── kv.proto
│ ├── control.proto
│ └── metrics.proto
│
├── rust/
│ ├── services/ binary crates
│ │ ├── cgn-router/ gateway + router. submodules:
│ │ │ src/{gateway,routing,cluster,cascade,disagg,admission}
│ │ ├── cgn-agent/ vLLM supervision, NVML, KvHandoff
│ │ ├── cgn-kvcached/ tiered KV daemon (GPU/RAM/SSD)
│ │ ├── cgn-metrics/ Prometheus aggregator + power telemetry
│ │ ├── cgn-ctl/ admin CLI + installer
│ │ └── cgn-operator/ kube-rs operator (CRDs in deploy/kubernetes/crds/)
│ │
│ └── libraries/ shared crates
│ ├── cgn-proto/ tonic-generated stubs (build.rs)
│ ├── cgn-core/ config, errors, hashing, prefix-trie
│ ├── cgn-tls/ rustls helpers, mTLS bootstrap
│ ├── cgn-telemetry/ tracing + OTLP + Prometheus wiring
│ ├── cgn-kv/ CUDA / io_uring / RDMA bindings
│ ├── cgn-auth/ OIDC + API-key + RBAC
│ ├── cgn-ratelimit/ governor + Redis backend
│ ├── cgn-k8s/ kube-rs helpers (CRD types, watchers)
│ ├── cgn-helm/ wrapper around the helm binary
│ └── cgn-power/ Redfish + IPMI power readers
│
├── deploy/
│ ├── docker/ Dockerfile, Dockerfile.agent (distroless / vllm)
│ ├── systemd/ *.service units for bare-metal install
│ ├── kubernetes/
│ │ ├── crds/ inferencecluster, modelpool, routingpolicy
│ │ └── helm/cognitora/ templates/, values.yaml
│ ├── terraform/
│ │ └── {aws,gcp,azure,hetzner,baremetal}/
│ └── installer/install.sh sha256 + cosign verified one-liner
│
├── docs/
│ ├── ARCHITECTURE.md
│ ├── architecture/ repo-layout, security, routing, kv-tiering, protocols
│ ├── guides/ quickstart, kubernetes, baremetal, cloud/{aws,gcp,…}
│ ├── operations/ observability, slo, runbooks/
│ ├── api/ openai.md (HTTP), grpc.md (internal)
│ └── reference/ config, env, exit-codes
│
├── configs/ cognitora.toml.example
├── SECURITY/ cosign.pub for release verification
│
├── examples/
│ ├── local-mac/ Mac/Ollama profile (engine.kind = "openai_compat")
│ └── multi-llm/ Linux/server profile (vLLM or llama-cpp-python)
│
├── tests/
│ ├── e2e/ multi_engine.sh (plugin layer + middleware),
│ │ single_node.sh, multi_node_kv.sh
│ └── perf/ criterion benches (CI perf gates)
│
├── scripts/
│ ├── install/ bootstrap-debian.sh, install-etcd.sh,
│ │ install-engine-{cpu,gpu}.sh, download-model.sh
│ ├── run/ up.sh / down.sh / status.sh (profile orchestrator)
│ └── release/ pack.sh (build a local tarball matching release.yml)
│
└── .github/workflows/ ci.yml, release.yml, e2e.yml
Conventions
Rust
- Every crate name is
cgn-<role>. Library crates expose a singlelib.rs; binary crates exposemain.rsand submodules undersrc/<feature>/mod.rs. - Service crates must depend on
cgn-core(config + errors),cgn-proto(wire types), andcgn-telemetry(logging + metrics). - Inter-crate references go through the workspace
[workspace.dependencies]table — never hard-code a relative path inside a leaf crate. - All public types are
Debug. Public types crossing thread boundaries areSend + Syncunless explicitly justified. unsafeis allowed only insidecgn-kv; everywhere else#![forbid(unsafe_code)]is the default.- One async runtime: tokio. One TLS stack: rustls. One serialization: serde + bincode for on-the-wire blobs.
cgn-router internal layout
cgn-router is the largest crate. Its top-level submodules each have a single responsibility:
| Module | Responsibility |
|---|---|
gateway/ | Axum HTTP server. OpenAI-compatible /v1/chat/completions and /v1/embeddings. SSE. |
routing/ | Score function, candidate selection, prefix-overlap weighting, KV-aware tie-breaking. |
cluster/ | etcd watcher, gossip, node registry, health. |
cascade/ | SLM → mid → LLM cascade FSM and confidence thresholding. |
disagg/ | Prefill/decode disaggregation: KV handoff handshake with cgn-agent and cgn-kvcached. |
admission/ | Token-bucket admission control, queue depth, rejection codes. |
Submodules communicate only via well-defined types in cgn-router::types; they never share mutable state directly.
Protos
- All RPCs live under
proto/cognitora/v1/. Breaking changes require a new package version (v2) and a deprecation window. common.protoholds shared messages; per-service files hold service definitions plus service-specific request/response types.buf lintandbuf breakingrun in CI.tonic-buildregenerates Rust stubs wheneverproto/changes.
Deployment
- Helm chart values are the canonical surface for cluster configuration. Anything that needs to differ between environments lives in a values override, not in templated logic.
- systemd units enforce
User=cognitora,NoNewPrivileges=true,ProtectSystem=strict,ProtectHome=true,MemoryMax=per role. - Terraform modules emit a uniform
cluster.json(or kubeconfig) thatcgn-ctl installconsumes. - The release pipeline embeds a tested
helmbinary intocgn-ctl(viainclude_bytes!) so installs work without external tooling.
Docs
- One file per concept. Long files are split into
docs/<area>/<topic>/index.mdplus children. - Every public RPC and every CLI command is documented; the binary's
--helpoutput is checked against the docs in CI.
Adding a new component
- Service or library? Decide; place under the right
services/orlibraries/subtree. - Add to the workspace. Append the new crate to the
memberslist inCargo.tomland (if it's a library) to[workspace.dependencies]. - Add to CI. The CI matrix walks
rust/{services,libraries}/*automatically; no hand-edits unless the new component has special requirements (e.g., GPU runners). - Document it. Add a one-line entry to this file and a dedicated page under
docs/architecture/if the component introduces new concepts.