Velozient seeks to hire a full-time, remote Senior Site Reliability Engineer (SRE) with 6+ years of high availability cloud experience in a DevOps or SRE role. The candidate must balance supporting a service organization, delivering exceptional operational performance, and building reliability engineering standards within a shared SRE team. SREs will apply an engineering mindset towards deploying highly reliable systems and eliminating unnecessary cost through automation.
The SREs core mission is building a highly reliable, scalable, and measurable customer experience that grows the clients business. Reporting to an SRE Manager, the candidate will join a team that deploys and manages cloud environments as well as supports the clients cloud-based initiatives. The role requires automating provisioning, deployment, management, and monitoring of a rapidly evolving, highly scalable set of SaaS services.
Only SRE candidates who are motivated, focused on quality, and driven to achieve critical service success without sacrificing security and overspending need apply. The SRE and engineering teams are distributed across multiple geographical regions primarily in North America and India, which requires flexibility around meeting timing. Successful candidates must prioritize tasks well and productively work independently. Working in a SAFe Agile framework, the candidate must have familiarity with Agile and participate in sprints as well as technical meetings.
Our client is a global leader in cloud-based networking and cybersecurity solutions. Their SaaS products allow operations teams to automate, standardize, and speed delivery of cloud-based networking and security services from a single pane of glass providing easy, scalable, and highly reliable network experiences. More than two thirds of Fortune 500 companies use our clients solutions today. Our client has also won numerous cybersecurity and best places to work awards making it a top destination for software engineers.
Responsibilities
• Engage with engineering teams proactively to drive and improve operational readiness across the service lifecycle including inception, design, deployment, operation, and refinement
• Build and maintain software modules for use and re-use in the cloud and on-premise systems automation
• Work closely with the software engineering team to ensure accurate monitoring and metrics are built into applications prior to production deployment
• Champion constant improvement of client methods to deploy and support cloud architecture
• Efficiently handle multiple requests of varying priority in addition to assigned daily tasks
Required Experience
• Excellent English communication skills
• 6+ years supporting a high-availability cloud environment in a DevOps, SRE, or Sysadmin role
• 4+ years working in Amazon Web Services (e.g., C2, S3, VPC, RDS, CloudFormation, and more)
• Experience with IaC framework (e.g., CloudFormation, Terraform, Config management, and more)
• Experience supporting multiple Kubernetes Clusters (e.g., EKS, ECS, OpenShift, AKS, and more)
• Great communication, interpersonal, and teamwork skills
Desired Experience
• Extensive, detailed networking skills including VPC, VPN, TCP/IP, routing, load balancing, or DNS
• Experience provisioning, deploying, and supporting containers with Helm, Kubernetes (Kops), or Docker
• Scripting and development experience using Python, Go, or bash with associated source control
• Deploying and supporting end-to-end CI/CD using GitHub, Jenkins, Spinnaker, CodePipeline, CodeBuild, Ansible, and more
• Hands-on monitoring tool usage including Sysdig, Loggly, AWS CloudWatch, Grafana, or Prometheus
• Experience with relational databases, such as MySQL and PostgreSQL, and document stores, such as MongoDB
• Experience deploying applications in containers using Docker and Kubernetes
• Knowledge or experience with KubeVela
• Microsoft Azure experience
Additional Information
• Enjoy a fun, fast-growing entrepreneurial company
• Be part of a highly collaborative learning culture share knowledge, be inclusive, learn and grow together.
• Embrace teamwork!
• Knowing your ideas are heard and matter think big!
• You get to own your job and be recognized for your contributions
• Work with smart and creative people
• Making mistakes is human. Let's learn from them. Be transparent!
• We recognize you as an individual no presumptions or judgment. Be the extraordinary you!
• 15 days Paid Time Off (PTO) plus your national holidays
• Start: ASAP