Site Reliability Engineer at Pacifica Continental | Torre

Site Reliability Engineer

You'll elevate our hybrid cloud infrastructure, driving automation and reliability for the nation's largest private Medicare marketplace.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (anywhere)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Posted 5 months ago

Requirements and responsibilities


About the team:Our engineering team has built the largest private Medicare marketplace in the country. We passionately focus on the continuous improvement of the systems we build. We have spent many years growing and fostering a DevOps culture by bridging the divide between our Software and Infrastructure Engineering departments. We want the cross-functional teams that we are building to include Site Reliability Engineers. We operate in a complex, multi-tenant, hybrid cloud and on-premises infrastructure that spans both the Windows and Linux OS. We strive for security, reliability, and automation in line with DevOps and Site Reliability Engineering principles. If you are passionate about learning and improvement through metrics and automation, and passionate about engendering that mindset in others, we want to hear from you.About the role:Maintains shared cloud resources in use by numerous software engineering teams within our business unit. We aim to enable software engineering teams to build cloud native applications that adhere to security and regulatory requirements with limited handholding by our cloud engineers. We do still have a fair number of applications hosted in on-premise data centers, which we aim to support migrating to the cloud.Requirements:Hands-on EngineeringWindows and Linux ServersVMwareCloud platforms, preferably with AzureActive DirectorySecrets management with Consul and Vault or similar systemsConfiguration management tools like Salt, Ansible and TerraformFirewalls and load balancers such as F5Web servers, including IIS and NGINXDatabase Server Infrastructure like Microsoft SQL Server and PostgreSQLApplication Performance Monitoring with tools like New RelicInfrastructure monitoring with tools like Sensu, SolarWinds, Nagios, or Azure App InsightsCI/CD tools like TeamCity, Octopus Deploy, Concourse, Azure DevOps, or GitHub ActionsLog Aggregation tools like SumoLogic or SplunkNetwork theory and protocols such as DNS, DHCP, proxy servers, and firewallsSecurity operations with tools for SAST, DAST, RAST, and WAFInfrastructure as Code or automation experience.Proficiency, high-comfort, and familiarity with:One or more programming languages, such as C#, JavaScript, Python or GoOne or more scripting languages, such as PowerShell and BASHCommand line tools such as (git, netcat, npm, terraform, etc.)Responsibilities:Make improvements to internal processes to reduce lead time and increase deployment frequencyIdentify improvements to the quality, security, and performance of our infrastructureIncrease the velocity with which teams deliver, leveraging expertise from various functional disciplinesIdentify how to remediate production incidents more quickly and safely while reducing the frequency of outagesActively engage with other teams and departments to collaborate on best practices and implementation strategyAdhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologiesCreate SLIs, SLOs, and SLAsContribute to capacity planning, advise and consult with teams who will be load/stress testingKeep up with industry innovations, recommending new tools or practices when appropriateActively mentor peers, developing their expertise and inspiring others to innovateProvide timely assistance and remediation solutions during critical situations and production incidentDocument and share lessons learned from production, including root cause analysisExplore new ways of improving communication between other Site Reliability Engineers and with other teamsWrite and maintain architectural, stakeholder, and policy documentation
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.