🍁 SearchCanadaJobs.com

Lead - Site Reliability Engineer

Company

FundsIndia

Location

chennai, tamil nadu

Type

Full-time

Role Overview


We are looking for a Lead Site Reliability Engineer with 6-7 years of experience to drive reliability, observability, and incident management practices. The ideal candidate will have strong expertise in Grafana stack , production monitoring, and handling critical incidents in high-availability systems.


Key Responsibilities

  • Act as the Incident Commander during production outages, ensuring timely resolution and stakeholder communication
  • Lead incident response, triage, RCA (Root Cause Analysis), and postmortems
  • Build and enhance observability systems using Grafana (Prometheus, Loki, Tempo)
  • Define and manage SLIs, SLOs, and SLAs for critical services.
  • Develop and maintain monitoring, alerting, and dashboards for proactive issue detection.
  • Collaborate with Dev, Infra, an...

🍁 Ready to Apply?

Take the next step in your Canadian career

Apply Now