Contractor, Project-Based (As Needed), Colombia (CST/COT)
Objective
Provide our team with a safe test environment and a push-button release process to ship integration work quickly and reliably, with built-in multi-tenant fundamentals.
Scope of Work
* Implement a staging platform in our DigitalOcean account, including applications, managed database, and cache, provisioned as code (Terraform) to ensure full reproducibility and avoid manual configurations.
* Establish an automated build-and-release pipeline in GitHub Actions (build, test, security scan, deploy, health check, one-click rollback).
* Implement basic monitoring: centralized logs, 2–3 alerts (deploy failure, container crash, DB/connectivity), and clear runbooks (deploy, rollback, first responder).
Deliverables
* Terraform module to deploy services on DigitalOcean App Platform or DOKS (reusable and parameterized).
* Everything defined as code: Terraform scripts to create, update, and destroy the environment (no manual actions), including least-privilege roles for infrastructure, CI, and repositories, with secrets stored in the provider’s vault.
* Core data services: managed Postgres (state/checkpoints) and Redis (queues/cache) provisioned and documented.
* Automated build and deployment: GitHub Actions pipeline that builds, tests, and deploys to the test environment, with health checks and one-click rollback.
* Per-tenant configuration pattern: tenant IDs on jobs, logs, and DB rows; tenant configuration store (status, secret references, limits); namespaced secrets; per-tenant heartbeat and Slack alert routing; CI workflow that accepts tenant_id (defaults to single-tenant) for targeted deployments.
* Functional staging environment using these artifacts (“hello” service) with an accessible test URL.
* Documentation and handoff: single repository with Terraform, CI workflows, and comprehensive documentation (README, infrastructure diagram, runbooks, and a short Loom video) to enable developers to perform deployments independently.
Success Criteria
* A sample service deploys automatically to staging upon changes to the main branch and is accessible via a health URL.
* Rollback functionality operates correctly, allowing quick recovery if a release fails.
* All resources are created through code (Terraform); no manual configurations. Secrets are securely stored in the provider’s vault.
* Alerts trigger for deployment failures or crashes, and concise runbooks are published (deploy, rollback, first responder) to enable autonomous deployment management.
Required Experience & Skills
* 5+ years of experience in DevOps/SRE with Terraform, Docker, GitHub Actions, and managed Postgres & Redis.
* Experience with logging, metrics, alerts, incident response, secrets, certificates, and least-privilege access management.
* Familiarity with DigitalOcean App Platform or DOKS, or equivalent Kubernetes experience.
* Experience in multi-tenant integrations, OpenTelemetry, and rate-limit/fairness patterns.
* Ability to produce clear, concise documentation and effectively explain technical choices to non-engineers.
If you’re interested in applying, please fill out this form: https://1uqiq.share.hsforms.com/2YaQDDyr4QZKaMfyJ44QxKw