Build, develop, and maintain high-performance runtime and compiler components, focusing on end-to-end inference optimization.
Define and implement mappings of large-scale inference workloads onto NVIDIA’s systems.
Extend and integrate with NVIDIA’s SW ecosystem, contributing to libraries, tooling, and interfaces that enable seamless deployment of models across platforms.
Benchmark, profile, and monitor key performance and efficiency metrics to ensure the compiler generates efficient mappings of neural network graphs to our inference hardware.
Collaborate closely with hardware architects and design teams to feedback software observations, influence future architectures, and codesign features that unlock new performance and efficiency points.
Prototype and evaluate new compilation and runtime techniques, including graph transformations, scheduling strategies, and memory/layout optimizations tail...