🔵 Requirements:
▪️ At least three years of experience in web-based Application Support (in contrast to customer support, which is not in scope for this position).
▪️ Good understanding of ISO/OSI model, encapsulation, Network protocols used by web, HTTP/S, REST API applications.
▪️ At least one year experience as Manager/Shift Lead of 24x7 Application Support teams.
▪️ Experience building at least one 24x7 ‘follow the sun’ application support team.
▪️ Experience in configuring and using as a complete platform user monitoring/observability solutions like NewRelic (or DataDog, Splunk, AppDynamics, Instana).
▪️ Experience setting up and fine-tune on-call rotation and alerting management platform like OpsGenie (or PageDuty).
▪️ Advanced Incident management skills and troubleshooting production web-/enterprise application issues
Incident commander experience with ServiceNow and ITSM/ITIL frameworks.
▪️ Fluent in defining SLO, measuring SLI/Error Budgets, and following SRE best practices.
▪️ Experience with Public cloud (Azure), CI/CD (Azure DevOps Pipelines, GitHub Actions, GitLab).
▪️ Experience in writing team documentation and efficient runbooks/SOPs for production web-applications/API- and cloud services
fluent with Linux/Windows CLI tools used for network apps/db troubleshooting (ping, traceroute, curl, nc, ss, dig, nslookup, tail, grep, nmap, tcpdump, telnet etc).
▪️ Basic scripting with bash/shell script/powershell and Ansible or similar tool.
▪️ Takes part in interviewing/hiring, and mentoring 4 IOC Shift Engineers (2 middle/2junior level) based in Guadalajara, Mexico region.
▪️ Takes part in onboarding new web-application/services to IOC proactive support.
▪️ Creates and keeps up-to date shift instructions, checklist, runbooks.
▪️ Prepares initial mentoring/onboarding/training programs for IOC Shift Engineers.
▪️ Leads initial hands-on shadowing for IOC Shift Engineers during first few weeks on job.
▪️ Approves shift schedule changes in specific region (Turkey).
▪️ Coordinates “follow the sun” schedules, escalations, and on-call rosters with another region (Mexico, Guadalajara).
▪️ Reviews shift handover reports to make IOC work more efficient, decrease alert fatigue and improve key KPIs (MTTA, MTTR)
Makes sure Shift Engineers follow best practice when dealing with incidents/alerts.
▪️ Verifies and keep up-to date internal procedures, rules, tooling and access documentation that Shift Engineers use in their daily job.
▪️ Carries initial soft-launch period work when new application is being on-boarded: reviews/updates runbooks, alerts, documentation, and escalation deficiencies.
▪️ Acts as Incident Commander during P1/P2 incidents shadowing Situation Management(SIMA).
▪️ Organizes regular trainings and knowledge transfer sessions on web-applications/services that are onboarded to NewRelic/OpsGenie/SRE processes.