AWS deployment
A reference EKS deployment with GPU nodes lives at
deploy/terraform/aws/. The module
provisions a VPC, an EKS control plane, a single managed nodegroup
of g5.2xlarge (or whatever you set for instance_type), and
applies the Cognitora Helm chart on top.
What you need
- AWS account and credentials (
aws sso loginor static keys). - Terraform 1.5+, kubectl, helm.
- A model artifact bucket — defaults to a HuggingFace Hub pull, but
S3 (
s3://your-bucket/llama3-8b/) cuts cold start to seconds.
Apply
cd deploy/terraform/aws
terraform init
terraform apply \
-var="region=us-east-1" \
-var="cluster_name=cognitora-prod" \
-var="instance_type=g5.2xlarge" \
-var="node_count=2"
After ~12 minutes you'll have:
- An EKS cluster named
cognitora-prodin the configured region. - A managed nodegroup of
g5.2xlarge(1× A10G per node) with thenvidia.com/gpu.present=truelabel and anvidia.com/gputaint. - The Cognitora chart installed in the
cognitoranamespace.
Sizing
| Workload | Suggested instance | Notes |
|---|---|---|
| 7-13 B models, dev / staging | g5.2xlarge | 1× A10G, 24 GB VRAM |
| 30-70 B models | g5.12xlarge | 4× A10G, TP=4 |
| 100 B+ models | p4d.24xlarge | 8× A100, TP=8 (multi-instance pod) |
| Long-context (>16k) | p5.48xlarge | H100, NVLink for KV transfer |
The router scales separately — router.replicas: 2 on
m5.large is plenty for tens of thousands of QPS because the routing
fast-path is sub-ms.
Storage
cgn-kvcached SSD tier wants fast NVMe. On the GPU instance types
above, the local NVMe is exposed at /dev/nvme1n1 — mount it at
/var/lib/cognitora/kv/ssd via a local-storage
StorageClassName in values.yaml:
kvcached:
ssdGib: 800 # 80% of the local NVMe on g5.2xlarge
storageClassName: local-storage
For multi-AZ deployments use EBS gp3 (cheaper than io2; the SSD tier doesn't need µs latency, the RAM tier handles that).
Ingress
The Helm chart ships the Ingress resource opt-in. To expose the
OpenAI API behind an AWS ALB:
router:
service:
type: ClusterIP
ingress:
enabled: true
className: alb
host: api.cognitora.example.com
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
ALB terminates TLS; the chart's mTLS still applies for router↔agent traffic.
IAM for the agent
If your model weights live in S3, add an IRSA-bound IAM role to the agent ServiceAccount:
agent:
serviceAccountAnnotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123:role/cognitora-agent
The role needs s3:GetObject on your model bucket. Nothing else.
Tear down
terraform destroy
This drops the chart first (Helm release as a Terraform resource),
then the cluster. EBS volumes detach but persist for one billing
cycle — aws ec2 delete-volume once you're sure.