Site Reliability Engineer
NETGEAR
Oct 2017 - Nov 2019 (2 years 2 months)
• Monitored production systems for performance, availability, and errors using Nagios, CloudWatch, and AWS monitoring tools,
ensuring 99.9% high availability of customer-facing services and responding to alerts within SLAs.
• Participated in an on-call rotation to troubleshoot and resolve critical system incidents across on-premises and AWS
environments, maintaining SLAs and minimizing downtime with an average incident resolution time of under 1 hour.
• Assisted in capacity planning and performance analysis for both on-premises and AWS-hosted systems to ensure infrastructure
could handle traffic spikes and business growth, particularly during holiday sales events.
• Collaborated with development teams to implement application performance mo