Case Studies

Projects & outcomes

Real work. Real challenges. The details that matter — not marketing copy.

KubernetesCloud MigrationVideo Infrastructure

High-Scale Video Platform Migration to Kubernetes

Designed and executed a complete infrastructure migration for a high-scale video platform — moving from a legacy on-prem environment to a production Kubernetes cluster on the cloud. Built the entire video processing pipeline (ingest, transcode, storage, CDN delivery) as cloud-native workloads.

~40%

Infrastructure cost reduction

10× faster

Deployment time

Zero

Downtime during migration

< 90s

Autoscaling response time

The Challenge

The client was running a high-traffic video processing platform on aging on-prem hardware. The system was brittle, hard to scale during traffic spikes, and required manual intervention for deployments. Processing queues would back up under load, and there was no reliable failover. They needed a path to cloud-native infrastructure without disrupting live video delivery.

What I Did

Assessed existing infrastructure and mapped all workloads, dependencies, and data flows
Designed target architecture on cloud Kubernetes (EKS) with autoscaling worker pools
Built full IaC with Terraform: VPC, node groups, storage, networking
Containerized all services and built Helm charts for each workload
Implemented ArgoCD for GitOps-based deployments across environments
Built video processing pipeline with autoscaling job workers (ffmpeg-based)
Set up CDN integration for video delivery and origin failover
Executed zero-downtime cutover with DNS-based traffic shifting

Stack & Tools

AWS EKSTerraformArgoCDHelmKEDAffmpegS3 + CloudFrontGitHub Actions

GPU InfrastructureKubernetesAI / ML

NVIDIA A100 GPU Integration on Kubernetes with MIG Partitioning

Designed and deployed a production multi-tenant GPU cluster on Kubernetes using NVIDIA A100s with full MIG (Multi-Instance GPU) partitioning — matching MIG profile sizes to model sizes so every GPU cycle counts. Small models get small slices; large models get the full card.

+65%

GPU utilization improvement

Up to 7×

Models served per A100

−55%

Inference cost per request

Full MIG

Tenant isolation

The Challenge

The client was building a multi-tenant AI inference platform and needed to serve dozens of models simultaneously — from lightweight 7B models to large 70B+ models — on a fixed pool of NVIDIA A100 80GB GPUs. Giving each model a full GPU was wasteful and expensive. Running everything on shared GPUs without isolation caused memory conflicts and unstable latency. They needed fine-grained, isolated GPU partitioning with Kubernetes-native scheduling.

What I Did

Deployed NVIDIA GPU Operator on Kubernetes to manage drivers, container runtime, and device plugins automatically
Enabled MIG mode on all A100 nodes and planned profile allocation based on model size tiers
Configured 1g.10gb MIG instances for small models (≤7B params) — up to 7 instances per GPU
Configured 2g.20gb MIG instances for mid-size models (7B–13B params)
Configured 4g.40gb MIG instances for large models (30B–40B params)
Reserved full 7g.80gb instances for 70B+ models needing the entire card
Applied custom Kubernetes node labels per MIG profile for precise pod scheduling
Built a dynamic MIG reconfiguration pipeline using mig-parted to reshape profiles on demand without node reboots
Set up resource quotas and LimitRanges per namespace to enforce fair GPU allocation across teams
Integrated vLLM inference server as the serving layer, pinned to specific MIG instances via device plugin
Built Prometheus + Grafana dashboards for per-MIG GPU utilization, memory, and inference throughput

Stack & Tools

NVIDIA A100 80GBNVIDIA GPU OperatorMIG / mig-partedKubernetesvLLMKEDAPrometheusGrafanaHelmTerraform

AutomationBare-MetalKubernetesAnsible

Zero-Touch Bare-Metal Provisioning: Rack to K8s Node in Under 2 Hours

Designed and built a fully automated provisioning pipeline for HPE ProLiant DL servers. From the moment a server is connected to the Cisco Nexus network, the pipeline takes over — running hardware diagnostics, applying server-specific BIOS and iLO settings via the Redfish API, installing Ubuntu, and joining the node to a production Kubernetes cluster. No manual steps. No SSH sessions. Just rack, cable, wait.

< 2 hrs

Provisioning time per server

~95%

Reduction in manual steps

Zero

Config drift incidents

Full

Re-provisioning support

The Challenge

The client was expanding their on-premises Kubernetes cluster with batches of new HPE ProLiant servers. Each server required manual BIOS configuration, OS installation, and node onboarding — a process taking 6–8 hours per server, prone to configuration drift and human error. With 20–100 servers to provision in rolling waves, the team needed a repeatable, auditable pipeline that could scale without adding headcount.

What I Did

Designed network-triggered provisioning flow: DHCP/PXE boot detected via Cisco Nexus switch events kicks off the pipeline automatically
Integrated HP Redfish API (iLO) to run pre-provisioning hardware diagnostics — memory, storage, NIC validation — and halt on failure before any OS install
Built Ansible playbooks to apply server-profile-specific BIOS settings (power profiles, boot order, hyperthreading, SR-IOV) based on server model detected from Redfish inventory
Used NetBox as the source of truth for IP allocation, rack position, server role, and cluster assignment — all pulled dynamically at provisioning time
Set up PXE + cloud-init for unattended Ubuntu Server installation with role-specific partitioning schemes per server type
Triggered GitLab CI / GitHub Actions pipelines from NetBox webhooks to drive the full provisioning sequence as code
Automated kubeadm-based Kubernetes node join using cluster join tokens generated and stored securely per provisioning run
Built idempotent re-provisioning support: re-racking or replacing a server re-runs the full pipeline cleanly from scratch
Implemented Slack + pipeline notifications at each stage (hardware pass/fail, OS install, K8s join) for full observability without SSH access

Stack & Tools

HPE ProLiant DLHP Redfish API (iLO)Cisco NexusAnsibleNetBoxGitLab CI / GitHub ActionsPXE + cloud-initUbuntu ServerKubernetes (kubeadm)

Working on something similar?

Let's talk. Book a free discovery call and we'll figure out if I'm the right fit for your project.

Book a Call