Senior Infrastructure Engineer - AI/ML
Cohere
May 2021 - Current (5 years 1 month)
• Built GPU inference clusters using Terraform, Amazon EKS, and Amazon Linux 2023 AMIs with pre installed NVIDIA drivers on g5 GPU instances, enabling deployment of 14 production models with zero manual provisioning. • Developed 18 CUDA/TensorRT-based Docker images via CI pipelines, reducing container build time by 40% and supporting 50+ model rollouts through Amazon ECR and EKS. • Packaged model runtime as Helm charts for private Kubernetes deployments, enabling 17 enterprise customers to self-host with documented onboarding workflows. • Implemented custom Kubernetes autoscaler integrated with AWS Auto Scaling Groups, reducing peak inference latency from 320ms to 256ms (20%) during 3x traffic spikes. • Deployed observability stack wit