Mukund Tiwari

Infrastructure & Inference (AI) Engineer

📍 Bengaluru, India 📱 (+91)9013799412 📧 mukundtiwari@outlook.com 🔗 LinkedIn

Summary

Infrastructure and Inference Engineer with 5+ years of experience building and scaling high-traffic AWS infrastructure, GPU-based inference systems, and production Kubernetes platforms. Proven track record in load testing at 10M+ TPM scale, self-hosting ML models serving 30% of production traffic, and driving 68% inference cost reductions. Strong in Terraform IaC, CI/CD automation, and cloud security hardening.

Experience

Senior DevOps Engineer at Primetrace Dec 2024 — Present
  • Owned end-to-end load testing for Crafto's peak traffic events, preparing the platform to handle ~10M TPM; system scaled to 1.6M concurrent users and 6M DAU, supporting revenue of INR 4Cr in 36 hours
  • Deployed self-hosted inference models (flux2-dev-32b 4-bit quantised, flux2-klein-4b) now serving 30% of production traffic, reducing per-inference cost from INR 3.5–4 to INR 0.8
  • Optimized GPU infrastructure for Pixora upscaling from 2x A100 80GB ($11/hr) to 1x L40S ($3.5/hr) — 68% cost reduction. Moved Sundar AI & Looks AI upscaling fully self-hosted, eliminating $3,200/mo Replicate spend
  • Self-hosted Codeformer and GFPGAN upscalers on dedicated infrastructure, improving latency, uptime, and cost control; self-hosted path absorbed traffic during third-party outages reducing customer-facing errors
  • Built Android CI/CD end-to-end via GitHub Actions + Slack, replacing manual builds — now critical infrastructure for the entire Crafto Android team
  • Implemented DB audit logs exported to CloudWatch and built internal DB user-management tool, enabling compliance tracking and access governance across production databases
  • Completed zero-downtime MySQL upgrade (8.0.42 → 8.4.7) on major production DB; created reusable Terraform module for EKS Pod Identity reducing setup from 3 resources to 1 declaration (3x productivity)
  • Implemented automated backup solution for critical infrastructure (Argo Workflows, ArgoCD, Vault, Route53, Pritunl VPN) using Kubernetes CronJobs and S3
  • Enabled ARM64/Graviton deployments for Kutumb, reducing compute costs. Architected and managed complete AWS infrastructure end-to-end in a lean 2-person DevOps team
  • Spearheaded migration from ClickOps to Infrastructure as Code (IaC) and established security best practices across the organization
DevOps Engineer at Ollion Aug 2022 — Nov 2024
  • Saved $230,000/year for Indonesian digital bank through EC2 rightsizing and VPC endpoint consolidation (90% reduction)
  • Led migration of Indonesian OTT platform from GKE to AWS EKS, managing Terraform, ingress controllers, and monitoring
  • Hardened security across 80+ AWS accounts for UK-based media organization with automated compliance scripts
  • Executed cross-cloud migration from S3-Redshift to GCS-BigTable/BigQuery with secure VPN connectivity
Senior Systems Engineer at Infosys Nov 2020 — Aug 2022
  • Migrated on-prem Redis to Azure Cache and Go apps to AKS for the world's largest private bank
  • Led cloud migration to AKS using Terraform for a Fortune 500 financial technology company

Skills

Certifications

Google Cloud Professional Cloud Security Engineer

Google Cloud Professional Cloud Architect

AWS Certified Solutions Architect - Associate

HashiCorp Certified: Terraform Associate

Microsoft Certified: Azure Administrator Associate

Education

B.Tech in Electronics & Communication Engineering Oct 2020

Maharaja Agrasen Institute of Technology, Delhi

Class 12, CBSE Mar 2016

Apeejay School, Pitampura, Delhi