Azure deployment
AKS module: deploy/terraform/azure/.
Provisions an AKS cluster with a GPU node pool
(Standard_NC8as_T4_v3 by default) and applies the Helm chart.
Apply
cd deploy/terraform/azure
terraform init
terraform apply \
-var="subscription_id=YOUR_AZURE_SUBSCRIPTION_ID" \
-var="location=eastus" \
-var="cluster_name=cognitora" \
-var="vm_size=Standard_NC8as_T4_v3" \
-var="node_count=2"
After ~12 minutes:
- AKS cluster
cognitorain the configured location. - GPU node pool with the
nvidia.com/gpu.present=truelabel and thenvidia.com/gpu=true:NoScheduletaint. - Cognitora chart installed.
Sizing
| Workload | VM size | GPU | Notes |
|---|---|---|---|
| 7-13 B dev | Standard_NC8as_T4_v3 | 1× T4 16 GB | cheap GPU |
| 30-70 B | Standard_NC24ads_A100_v4 | 1× A100 80G | TP=1 |
| 100 B+ | Standard_ND96asr_v4 | 8× A100 40G | TP=8 |
| Long-context | Standard_ND96isr_H100_v5 | 8× H100 | NVLink |
You'll need to request quota for the H100 / A100 SKUs; T4 is usually available immediately.
Ingress
AKS pairs naturally with Application Gateway:
router:
ingress:
enabled: true
className: azure-application-gateway
host: api.cognitora.example.com
annotations:
appgw.ingress.kubernetes.io/ssl-redirect: "true"
appgw.ingress.kubernetes.io/use-private-ip: "false"
Or use NGINX ingress controller if you prefer cloud-portable config.
Identity
Azure AD Workload Identity for agents pulling from Azure Blob Storage:
agent:
serviceAccountAnnotations:
azure.workload.identity/client-id: <UAMI client id>
podLabels:
azure.workload.identity/use: "true"
Federate the UAMI with the AKS OIDC issuer; grant it
Storage Blob Data Reader on the model container.
Tear down
terraform destroy