Experienced DevOps Engineer (8+ years) at Lead Love | Torre

Experienced DevOps Engineer (8+ years)

You'll engineer reliable integrations, ensuring seamless operations and scalable growth.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Freelance
Recurrent
Compensation
USD40 - 55/hour
Non-negotiable
location_on
Remote (for Colombia residents)
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Posted 4 months ago

Requirements and responsibilities


Role Overview. Own the reliability and release operations for our integration work. You’ll give developers a smooth path from code to production, keep environments healthy and secure, and make system health visible so issues are found and fixed fast. Over time, you’ll tune cost/performance, harden security, and evolve our standards so we can ship integrations predictably as we scale. What you’ll do. * Own the platform lifecycle: maintain and improve our cloud setup (DigitalOcean preferred), databases (Postgres), caches/queues (Redis), and the way environments are created (Terraform/IaC). * Operate releases: keep CI/CD fast and safe (GitHub Actions), enforce health checks and rollbacks, and make deploys predictable across multiple integration workstreams. * Make reliability visible: centralize logs/metrics/traces, keep practical alerts in place, and publish clear runbooks so first responders know what to do. * Strengthen security & compliance basics: secrets handling, least-privilege access, image scanning, patches, and simple evidence for audits when needed. * Manage capacity, cost, and performance: right-size resources, set autoscaling policies, and keep cloud spend within plan. * Enable the team: answer “how do we…?” questions, write concise docs, and collaborate closely with the Fractional CTO to unblock delivery. Success Metrics. * Deployment success rate — % of deploys that complete without rollback. Target: ≥95%. * Time to restore — median time to recover from a production incident. Target: ≤30 minutes. * Operational visibility — core alerts verified monthly; runbooks exercised in a safe test. Target: 100% pass. * Cost & capacity — stay within agreed monthly cloud budget while meeting performance targets. Experience Required. * DevOps/SRE: 8+ years running cloud-hosted applications end-to-end. * IaC & containers: Terraform, Docker; reproducible environments and change control. * CI/CD: GitHub Actions (or similar) with build/test/scan/sign, blue/green or rolling deploys, and proven rollback. * Data & queues: operating managed Postgres and Redis at production scale. * Observability & ops: logging/metrics/alerts (OpenTelemetry or equivalent), incident triage, basic on-call hygiene. * Security controls: secrets management, certs, firewalls, IAM; incident response. * Multi-tenant integration patterns (per-tenant config, fairness/rate-limit). * Apigee knowledge; OpenTelemetry; experience with multi-tenant SaaS and token-bucket rate limiting. Mindset. Pragmatic and service-oriented • automates toil • documents as they go • calm in ambiguity • explains choices in plain English • raises the bar without heavy process. If you’re interested in applying, please fill out this form: https://1uqiq.share.hsforms.com/2YaQDDyr4QZKaMfyJ44QxKw
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.