We are seeking a Site Reliability Engineer (SRE) to join our team in Singapore.
WHAT YOU’LL DO:
- Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs,
- Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch,
- Automate work including infrastructure needs, testing, failover solutions, failure mitigation, and much more,
- Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing,
- Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents,
- Participate and establish best practices in Site Reliability Engineering,
- Manage code deployments, fixes, updates, and related processes,
- Work with a close-knit team and brain...