We are hiring for one of our clients, seeking a Site Reliability Engineer (LInE) to work on a contract basis. This role requires expertise in maintaining and optimizing high-availability systems to ensure seamless operations for end users. The position involves collaborating with cross-functional teams to implement and monitor infrastructure solutions.Key Responsibilities:Design, implement, and maintain scalable infrastructure to support critical services and applications.Develop and enforce SLOs, SLIs, and error budgets to ensure system reliability and performance.Troubleshoot and resolve incidents, performing root cause analysis and implementing preventative measures.Automate operational tasks using scripting languages and infrastructure-as-code tools.Monitor system health, analyze metrics, and optimize resource utilization for cost efficiency.Required Skills & Qualifications:Proficiency in Linux system administration and troubleshooting.Experience with cloud platforms such as AWS, GCP, or Azure.Knowledge of containerization and orchestration tools like Docker and Kubernetes.Familiarity with monitoring and observability tools such as Prometheus, Grafana, or Datadog.Strong scripting skills in Python, Bash, or Go for automation and tooling.This role offers a unique opportunity to work with a global leader in the Software Development industry, contributing to the delivery of high-performance, fault-tolerant systems. The position involves direct impact on user-facing services and infrastructure decisions.We hire based on skills and expertise. All qualified candidates are welcome regardless of background, experience, or prior employment history. Applications are reviewed solely on demonstrated technical ability and qualifications.
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
Thank you
Thank you
check_circle
Payment confirmed
Torre Hunt
Torre Hunt
A member of the Torre team will contact you shortly
In the meantime, continue adding information to your job opening.