What You Will Do
Operational Support & Incident Management
Provide 2nd level support for production systems and critical business applications.Investigate, troubleshoot, and resolve incidents and performance issues.Perform root cause analysis (RCA) and document findings in a structured manner. Monitoring, Observability & Automation
Design, implement, and maintain monitoring dashboards.
Improve alert quality and reduce noise through effective threshold and metric design.
Analyze logs, metrics, and system behavior to proactively detect anomalies, automate operational processes using Ansible and scripting. What You Bring
Operational Mindset & CollaborationProven experience in Site Reliability Engineering, DevOps, or 2nd level production support.Effective communication skills and ability to work with cross-functional teams.
Technical Skills