Design, implement, and operate observability and AIOps capabilities for cloud-native and hybrid environments, supporting reliable, production-grade services
Lead the onboarding of early adopter teams and services, defining and applying standards for telemetry, SLIs, SLOs, and alerting in real-world systems
Work hands-on with engineering, SRE, and operations teams to gather requirements and translate them into actionable observability and automation solutions
Build and maintain telemetry pipelines, dashboards, and alerting, leveraging OpenTelemetry to deliver meaningful insights and reduce operational noise
Run and evolve observability services in Kubernetes environments, using Helm and Infrastructure as Code (Terraform), integrating with ITSM, ticketing and event management systems
What You Bring
5+ years of hands-on experience in observability, SRE, platform, or relia...