Compute Offloading Architecture¶

For computationally intensive workloads (model training, large-scale inference), AIDF services offload execution to external GPU infrastructure while keeping the main service container lightweight.

How It Works¶

The AI service receives a training/inference request via its HTTP endpoint in AKS
The platform_cluster_manager.py selects the appropriate compute provider based on configuration
The selected deployment module (RunPod/Denvr/on-prem/Azure) provisions a GPU cluster or pod
The Docker image (from ACR) is deployed to the GPU infrastructure
The heavy computation runs on GPU hardware
Results are stored in the central model repository (Azure Blob) or database
The main service polls for completion and returns results to the caller

Supported Platforms¶

Platform	Best For	Billing
RunPod	Burst fine-tuning jobs	Pay-per-use GPU pods
Denvr	Large-scale training with consistent performance	High-performance dedicated GPU clusters
On-Premises	Data-sovereign workloads where data cannot leave the organization's network	Internal GPU servers
Azure	Integrated Azure-native deployments	AKS node pools with GPU-enabled VMs