SRE/DevOps Engineer at VDart | Torre

SRE/DevOps Engineer

Drives service reliability and scalability through SRE principles, automation, cloud infrastructure, and observability
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (for Mexico residents)
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Posted about 1 year ago

Requirements and responsibilities


Site Reliability Engineer (SRE) Location: Remote in Mexico Job Type: Full-time Experience Level: [Mid/Senior] Job Overview: We are looking for a Site Reliability Engineer (SRE) to help drive transformation initiatives within an SRE-focused organization. The ideal candidate will have deep expertise in SRE principles, DevOps methodologies, and cloud computing (AWS/GCP) while working on architecture, system design, and automation to ensure high availability and reliability of systems. Key Responsibilities: • Implement SLA, SLI, and SLO frameworks to enhance service reliability. • Drive incident management processes and establish a reliability-focused engineering culture. • Architect and design highly reliable systems, ensuring scalability and performance. • Conduct performance analysis and saturation analysis to proactively improve system health. • Develop infrastructure as code (IaC) using Terraform to manage cloud environments. • Implement observability tools for real-time monitoring, logging, and alerting. • Work on DevOps automation with Python and batch scripting to optimize workflows. • Deploy and manage containerized applications using Kubernetes. • Define and implement budget policies related to SRE and cloud resource optimization. Required Skills & Experience: • Strong understanding of SRE principles and DevOps methodologies. • Hands-on experience with Kubernetes for container orchestration. • Proficiency in Python and batch scripting for automation. • Expertise in infrastructure as code (Terraform) for cloud infrastructure management. • Experience with cloud platforms (AWS/GCP) and their reliability best practices. • Knowledge of observability tools for monitoring, logging, and alerting. • Solid understanding of architecture, system design, and performance analysis. • Experience in defining and managing SLA, SLI, and SLO objectives. Nice-to-Have: • Experience working in SRE transformations within large-scale organizations. • Familiarity with cost optimization and budget policies for cloud infrastructure. • Strong communication skills to drive a reliability-first culture within teams
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.