Senior Site Reliability Engineer at Wikimedia Foundation | Torre

Senior Site Reliability Engineer

Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time
Compensation
USD105K - 163K/year
location_on
Remote (anywhere)
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Posted over 2 years ago

Requirements and responsibilities


The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. Wikimedia Foundation is hiring a Senior Site Reliability Engineer (SRE) to join our Service Operations SRE team, where we take care of the infrastructure that runs wikipedia.org and other Wikimedia Foundation projects. The SRE team at Wikimedia is a distributed and diverse team of engineers with a drive to explore, experiment and embrace new technologies. The SRE Service Operations team focuses on the application layer of our complex infrastructure as well as our developer-facing services. Responsibilities: - Design, implementation and maintenance of public facing infrastructure and services - Use of configuration management and deployment tools - Architectural design and operation at scale - Monitoring of systems and services, optimization of performance and resource utilization - Common operating system level tasks such as logging and backup / restore - Cookbook / runbook implementation for common maintenance actions - Incident response, diagnosis and follow-up on system outages or alerts - Automation and streamlining of tasks as well as identifying process gaps - Collaborating with a global and asynchronously communicating team - Mentoring peers in your areas of technical and operational strength
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.