Senior Site Reliability Engineer at Lyrebird Health | Torre

Senior Site Reliability Engineer

You'll own critical systems, building reliability for AI-powered healthcare in high-stakes environments.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (for United Kingdom residents)
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
28 days ago

Requirements and responsibilities


The RoleWe’re hiring a Senior SRE to own the reliability, scalability, and performance of our production systems as we continue to grow.At Lyrebird, you won’t just respond to incidents. You’ll design the systems and standards that prevent them. That means building infrastructure that scales cleanly, creating deployment patterns that reduce risk, and ensuring we can detect and resolve issues before they impact users.This is a broad role that sits across platform engineering, DevOps, and security. You’ll be responsible for ensuring our systems are resilient under load, observable in real time, and able to scale as usage increases.You’ll play a key role in how we get code from a developer’s machine into production safely, and how we operate those systems once they’re live.About UsLyrebird Health builds AI-powered tools that reduce the administrative burden on clinicians and improve the quality and accessibility of healthcare.Our platform is used by thousands of clinicians across multiple markets. As we grow, we’re focused on building systems that are reliable, scalable, and trusted in high-stakes environments.What you'll doKeep production systems online and restore them quickly when they failLead and manage incidents, making high-quality decisions under pressureDesign and implement scalable infrastructure and deployment patternsBuild and improve CI/CD pipelines and release systemsImprove monitoring, telemetry, and observability across the stackOwn cloud infrastructure, security, and access controlsWork closely with engineers to ensure systems are built to scale from day oneWhat you'll bring5–7 years experience in SRE, platform engineering, or DevOps rolesStrong AWS experience (ECS/Fargate, EC2, Lambda, SQS, IAM)Experience running and scaling production systemsStrong understanding of distributed systems and scaling approachesHands-on experience with Docker and containerised environmentsExperience with Kubernetes or ECSHow you workYou take ownership and follow things throughYou’re proactive and comfortable operating with ambiguityYou stay calm and make good decisions during incidentsYou focus on solving problems end to endYou’re willing to roll up your sleeves and get into the detailWhy joinThis is a critical hire for us as we scale.If you want real ownership over how systems are designed, deployed, and operated, and the opportunity to build reliability into a product used in high-stakes environments, we’d love to hear from you.We’re building a team that reflects the diversity of the people who use our product. If you’re from an underrepresented background in tech, we strongly encourage you to apply, even if you don’t meet every requirement.
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.