SRE - Platform Engineer at DroneUp | Torre
warning

Heads-up

The job you’re trying to post already exists in Torre:

SRE - Platform Engineer

You'll architect and scale autonomous flight infrastructure, ensuring reliability for a world-changing drone ecosystem.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Compensation
USD125k - 150k/year
location_on
Remote (for United States residents)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
about 1 month ago

Requirements and responsibilities


DroneUp, LLC is a technology company built on years of complex airspace management and UAS operations experience with a foundation that uniquely positions us to deliver autonomous airspace management solutions that serve both regulators and operators.It is a company with a vision to make autonomous flight great for communities, great for business, and great for the world. Yet more than visionaries; we have the tools, instruments, focus, and expertise to execute while utilizing a “People Matter Most” mentality.Our founder envisioned a massive, untapped opportunity to leverage autonomous flight that would revolutionize how the world may "pitch" and "roll" in the future. To start, we have harnessed the power of airspace technology, analytics platforms, and drone services to transform business operations. Our long-term mission is to be “Safe and Be Exceptional” while building and deploying the world's most accessible drone ecosystem.Knowing that our mission critical success comes directly from the people we bring onboard, we strive to provide opportunities for our employees to learn, grow, and go beyond the normal Field of View! Come fly with us as our team goes through our checklists that will “Inspire Fast Action” and take an entire industry to new heights. “Be a Person Others Want to Follow!”About the roleDroneUp is seeking an SRE - Platform Engineer who will focus on ensuring the reliability, scalability, and performance of our internal and client-facing IT infrastructure and developer platform. This role combines strong operational expertise with platform engineering principles, emphasizing uptime, incident response, and observability. The ideal candidate will drive SRE best practices, including SLO/SLI management, monitoring, and proactive system improvements, while collaborating with the broader platform engineering team. Our principles include self-service, security by default, automation, and building resilient systems for software delivery at scale.What you'll doBroad domain architect for the internal developer platform and all cloud engineeringDrive architecture for tooling or in-house softwareMentor other platform engineers to drive strong engineering practicesEnablement of platform engineering technical capabilities in our internal client teams in software engineeringPeer with the senior architects and engineers in software engineeringArchitecture and engineering focused on GCP environmentArchitect and oversee GKE cluster operations and workload managementProvide feedback to others and participate in peer reviews / pair programmingDrive the broad adoption of Test Driven Development through designing, development, and debugging unit and integration tests for new and existing infrastructure and codeContinuous curiosity of existing implementations and new technologies and sharing with the teamPractice continuous improvement across all job areas and personally / professionallyClearly communicate with platform engineering teams and other stakeholders and provide technical direction while doing soStay current with platform changes and third-party libraries. Proactively investigate better solutions for current solutionsAn understanding of Open Telemetry and true observability and the difference between it and monitoring and loggingGrow the engineering culture towards a high-performing teamPractice the arts of self-service, least privilege and security by default in all solutionsDefine and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgetsLead incident response, including on-call rotations, root cause analysis, and post-mortem reviewsImplement and optimize monitoring, alerting, and observability systems for system reliabilityCollaborate on capacity planning and performance optimization to ensure high availabilityOther duties as assignedOur Tooling Stack Includes but is Not Limited to:Github / Github ActionsGCPKubernetes (via GKE), Helm, DockerGSM Secrets Management (part of GCP)TerraformHoneycombGrafana stackPrometheusQualificationsBachelor's degree in Computer Science, Computer Engineering or related field or 8+ years experience as a software engineerProficiency in kubernetes. Optional: CKA, CKADExtensive experience in Unix / LinuxPolyglot and proficiency in multiple languages (ideally: Golang, NodeJS, Python, HCL and more)Knowledge of multi-cloud environment, including GCP, AWS, and Azure (familiar with at least two of these environments)Experienced in using git in trunk-based development modelsExperience in use of feature flagging in infrastructure and runtime (k8s)Experience with backend database technology is a plus, including supporting and performance enhancementsAdvanced experience working with and creating public cloud resources in Terraform or other infrastructure as code toolsExperience participating in a 24/7 on-call schedule without supervision and successfully resolving issues without escalationExperience using Open Telemetry for observability as well as other monitoring tools such as datadog, new relic and othersGood understanding of networking and routing principlesExperience in dockerizing applications and orchestrating them with kubernetesFamiliarity with security configuration for web/api services (SSL, Access control)Experience with JIRA or other work tracking systems. Ability to resolve tickets according to priority order and collaborating with the Technical Product Manager to adjust prioritiesExcellent documentation details, using Confluence or similar tooling – this could include support notes, runbooks, ADRs, etcFamiliarity with creating an end to end CI/CD pipeline using various tools with artifact storageFamiliarity with use of MacOS as a desktop and predominantly CLI interfacesExperience in a “product mindset” by understanding stakeholder needs, priorities and business valueExperience with security compliance frameworks including FedRAMP, NIST, and SOC2Proven experience in SRE practices, including incident management and reliability engineeringFamiliarity with monitoring tools like Prometheus, Grafana, or Honeycomb for observabilityExperience with chaos engineering, load testing, or reliability testing frameworksSecurity Responsibility Statement: Employees are expected to provide a high level of security to any personal or private information accessed as part of their work, whether at a DroneUp facility or remotely.  This includes participating in security training, remaining sensitive to individual rights to personal privacy, and complying with company policies. Employees who have access to sensitive data that is protected by regulation, such as HIPAA, or by contract, such as credit card data, must comply with any additional requirements dictated by the governing regulations or associated contracts.The pay range for this role is: 125,000 - 150,000 USD per year (NCR)
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.