Azure deployment

AKS module: deploy/terraform/azure/. Provisions an AKS cluster with a GPU node pool (Standard_NC8as_T4_v3 by default) and applies the Helm chart.

Apply

cd deploy/terraform/azure
terraform init
terraform apply \
  -var="subscription_id=YOUR_AZURE_SUBSCRIPTION_ID" \
  -var="location=eastus" \
  -var="cluster_name=cognitora" \
  -var="vm_size=Standard_NC8as_T4_v3" \
  -var="node_count=2"

After ~12 minutes:

  • AKS cluster cognitora in the configured location.
  • GPU node pool with the nvidia.com/gpu.present=true label and the nvidia.com/gpu=true:NoSchedule taint.
  • Cognitora chart installed.

Sizing

WorkloadVM sizeGPUNotes
7-13 B devStandard_NC8as_T4_v31× T4 16 GBcheap GPU
30-70 BStandard_NC24ads_A100_v41× A100 80GTP=1
100 B+Standard_ND96asr_v48× A100 40GTP=8
Long-contextStandard_ND96isr_H100_v58× H100NVLink

You'll need to request quota for the H100 / A100 SKUs; T4 is usually available immediately.

Ingress

AKS pairs naturally with Application Gateway:

router:
  ingress:
    enabled: true
    className: azure-application-gateway
    host: api.cognitora.example.com
    annotations:
      appgw.ingress.kubernetes.io/ssl-redirect: "true"
      appgw.ingress.kubernetes.io/use-private-ip: "false"

Or use NGINX ingress controller if you prefer cloud-portable config.

Identity

Azure AD Workload Identity for agents pulling from Azure Blob Storage:

agent:
  serviceAccountAnnotations:
    azure.workload.identity/client-id: <UAMI client id>
  podLabels:
    azure.workload.identity/use: "true"

Federate the UAMI with the AKS OIDC issuer; grant it Storage Blob Data Reader on the model container.

Tear down

terraform destroy