🍁 SearchCanadaJobs.com

Senior Site Reliability Engineer

Company

Heidi Health Ltd

Location

london, england

Type

Full-time

The Role

This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform.

What you’ll do

  • Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.

  • Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.

  • Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.

  • Strengthen observability: Improve dashboards, alerts, logs, and traces...

🍁 Ready to Apply?

Take the next step in your Canadian career

Apply Now