🍁 SearchCanadaJobs.com

Principal Software Engineer, At-Scale Reliability and Fleet Intelligence — CSP Engagements

Company

NVIDIA

Location

Santa Clara, CA

Type

Full-time

We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for fleet-scale reliability, working directly with engineering teams of key CSP / hyperscale customers to ensure NVIDIA platforms achieve target MTBI (Mean Time Between Interruptions) in production. In this role, you will augment NVIDIA's internal software/firmware and quality teams with a dedicated CSP-facing focus. You will drive work streams with CSP engineering teams to build shared understanding of reliability software/firmware architecture, methodology, incorporate their fleet telemetry and failure data into NVIDIA's improvement priorities, and validate that reliability improvements measured in the lab translate to real customer environments. Your cross-CSP visibility enables you to distinguish systemic architectural gaps from environmental or configuration-specific issues that no single customer engagement could identify alone.


What you'll be doing:
+ D...

🍁 Ready to Apply?

Take the next step in your Canadian career

Apply Now