Site Reliability Engineer public Remote experience
BUK
Feb 2024 - Current (2 years 6 months)
Lead infrastructure reliability for multi-tenant SaaS platform serving 25,953 tenants across 5 countries.
●Architected 11 Terraform/Terragrunt modules: 4 organization-wide reusable modules, 3 legacy refactors to Terragrunt, 4 built from scratch—standardizing infrastructure across all environments
●Led complete infrastructure migration of operations from São Paulo to Ohio (EKS, RDS, ArgoCD, EC2, Route53, load balancers), achieving cost reduction and performance optimization
●Managed production EKS environments across multi-shard architecture, including cluster upgrades, new shard provisioning, and zero-downtime migrations
●Built monitoring and alerting for EKS clusters, RDS databases, and background jobs; created Grafana dashboards for infra