Infra/DevOps Engineer at Maze

Summary of the Role: As Infra/DevOps Engineer at Maze, you'll be the architect of our complex, multi-account Kubernetes infrastructure, building and scaling the foundation that powers our AI-driven cybersecurity platform across isolated enterprise environments. This is a unique opportunity to join as one of the early engineering team members of a well-funded startup building at the intersection of generative AI and cybersecurity. You'll design, code, and maintain sophisticated infrastructure spanning 12-15 AWS accounts, each with dedicated Kubernetes clusters, ensuring complete data segregation for our security-conscious enterprise customers. You'll take full ownership of our infrastructure-as-code implementation, managing multiple Kubernetes clusters at scale using cutting-edge tools like Karpenter, Flux, and Kustomize. Your success will be measured by infrastructure reliability, deployment velocity, and your ability to build self-managed, distributed systems that scale elegantly as we grow from startup to enterprise scale. This role is perfect for a hands-on infrastructure engineer who has mastered complex Kubernetes deployments at scale, writes production-grade infrastructure code, and thrives on building simple, elegant solutions to complex distributed systems challenges. Your Contributions to Our Journey: * Architect Multi-Cluster Kubernetes Infrastructure: Design, implement, and write infrastructure-as-code for our complex Kubernetes setup spanning multiple AWS accounts, ensuring each cluster is completely isolated for enterprise security requirements while maintaining operational efficiency. * Build Self-Managed, Distributed Systems: Develop infrastructure that manages itself through GitOps workflows using Flux and Kustomize, creating distributed systems where actions in one place automatically trigger appropriate changes across the infrastructure without manual intervention. * Scale Kubernetes Operations: Manage and optimize dozens of Kubernetes clusters across our multi-tenant and single-tenant environments, implementing auto-scaling solutions with Karpenter and ensuring seamless scaling as customer workloads grow exponentially. * Develop Production-Grade Automation: Write robust, maintainable code to build and maintain CI/CD pipelines, custom automation tools, and deployment scripts that enable rapid feature delivery while maintaining the highest reliability standards. * Ensure Enterprise Security: Implement security best practices and compliance measures that protect our highly sensitive security data, managing firewalls, encryption, IAM policies, and network segregation across our multi-account AWS architecture. * Optimize Platform Performance: Build comprehensive monitoring, logging, and alerting systems that proactively identify issues, using tools like Prometheus and Grafana to ensure our infrastructure scales efficiently as we handle increasingly complex workloads. * Enable Engineering Velocity: Work closely with backend and data engineering teams to build self-service infrastructure capabilities, allowing teams to provision databases, deploy services, and scale resources independently without constant infrastructure team involvement. What You Need to Be Successful: * Kubernetes Mastery at Scale: 5+ years of infrastructure/DevOps experience with deep, hands-on expertise managing complex Kubernetes deployments. You must have experience with multiple Kubernetes clusters (tens of clusters) in sophisticated setups, not just simple single-cluster environments. * GitOps and Modern K8s Tooling: Proven production experience with Karpenter for auto-scaling, Flux for GitOps, and Kustomize for configuration management. * AWS Infrastructure Expertise: Deep knowledge of AWS with hands-on experience managing complex multi-account architectures, understanding how to design for isolation, security, and scalability across numerous AWS accounts with proper networking and IAM configuration. * Infrastructure-as-Code Excellence: Strong coding skills with production experience using Terraform or CloudFormation, writing maintainable, well-architected infrastructure code that follows best practices and scales with organizational growth. Proficiency in Python is essential for automation, tooling, and infrastructure development. * Hands-On Coding: Currently active as a developer writing production code in Python for infrastructure automation, custom tooling, and operational scripts. * Simplicity-Driven Architecture: Proven ability to build simple, elegant solutions to complex infrastructure problems, with a strong instinct for using tools like Helm charts appropriately while avoiding over-engineering. * Platform Thinking: Experience building infrastructure with a platform mindset, creating systems that support multiple products and enable team self-service rather than one-off solutions. * AWS Managed Services Philosophy: Understanding of when to use AWS managed services such as RDS, MSK, and EMR versus building custom solutions, with experience scaling startups using managed services efficiently before investing in complex self-hosted infrastructure. * Distributed Systems Mindset: Deep understanding of distributed systems principles with experience building decentralized infrastructure that allows independent operation across multiple clusters and regions. Nice to haves: * Experience with AWS auto-scaling across complex, multi-cluster environments. * Background in security-focused infrastructure or handling sensitive enterprise data. * Previous experience at scale-ups that grew infrastructure from 20-100+ engineers. * Knowledge of infrastructure observability tools beyond Prometheus and Grafana, such as the ELK Stack. * Track record of building infrastructure that went through SOC2, ISO, or similar compliance certifications. Why Join Us: * Ambitious Infrastructure Challenges: We're using generative AI, including LLMs and agents, to solve critical cybersecurity challenges, requiring sophisticated infrastructure that handles sensitive security data across isolated enterprise environments. * Expert Team: We are a team of hands-on leaders with deep experience in Big Tech and scale-ups, with leadership backgrounds behind multiple acquisitions and an IPO. * Impactful Work: Cybersecurity is a force for good, helping stop cyber attacks and enabling better outcomes worldwide. The infrastructure you build will directly support organizations in protecting themselves from real threats. * Build an AI-Native Company: We're building a company in the AI era with the opportunity to design everything from the ground up, architecting infrastructure using cutting-edge Kubernetes practices and establishing platform standards that scale from startup through hypergrowth. * Technical Leadership Growth: Direct partnership with experienced engineering leadership, significant equity upside, and the opportunity to own and shape the entire infrastructure function as we scale our platform to support the world's largest enterprises.

Infra/DevOps Engineer

Emma

Requirements and responsibilities

Skills wanted:

Language(s) required:

Phil O'Hagan

Phil O'Hagan

About Maze:

mission:

www.mazehq.com

Admin access needed

Top channels attracting candidates

Reviews by applicants

Top referrers attracting candidates

Payment confirmed

A member of the Torre team will contact you shortly

Infra/DevOps Engineer

Emma

Skills wanted:

Language(s) required:

Phil O'Hagan

Phil O'Hagan

About Maze:

mission:

www.mazehq.com

Admin access needed

Top channels attracting candidates info

Reviews by applicants info

Top referrers attracting candidates info

Payment confirmed

A member of the Torre team will contact you shortly

Top channels attracting candidates

Reviews by applicants

Top referrers attracting candidates