Compute Offloading Architecture¶
For computationally intensive workloads (model training, large-scale inference), AIDF services offload execution to external GPU infrastructure while keeping the main service container lightweight.
How It Works¶
- The AI service receives a training/inference request via its HTTP endpoint in AKS
- The
platform_cluster_manager.pyselects the appropriate compute provider based on configuration - The selected deployment module (RunPod/Denvr/on-prem/Azure) provisions a GPU cluster or pod
- The Docker image (from ACR) is deployed to the GPU infrastructure
- The heavy computation runs on GPU hardware
- Results are stored in the central model repository (Azure Blob) or database
- The main service polls for completion and returns results to the caller
Supported Platforms¶
| Platform | Best For | Billing |
|---|---|---|
| RunPod | Burst fine-tuning jobs | Pay-per-use GPU pods |
| Denvr | Large-scale training with consistent performance | High-performance dedicated GPU clusters |
| On-Premises | Data-sovereign workloads where data cannot leave the organization's network | Internal GPU servers |
| Azure | Integrated Azure-native deployments | AKS node pools with GPU-enabled VMs |