1. Operations & Availability
Oversee daily data centre operations to ensure maximum uptime and resilience.
Manage incident response, root cause analysis (RCA), and post-incident reporting.
Implement and maintain Business Continuity Planning (BCP) and Disaster Recovery (DR) strategies.
Monitor capacity (power, cooling, rack space) and plan for scalability
2. Infrastructure Management
Supervise critical systems: UPS, generators, HVAC, fire suppression, structured cabling
Ensure preventive and corrective maintenance schedules are executed
Manage hardware lifecycle (servers, storage, networking equipment)
Coordinate installations, migrations, and decommissioning
3. Team Leadership
Lead and mentor operations engineers and technicians
Define shift schedules for 24/7 coverage where applicable
Drive training on SOPs, emergency procedures, and safety protocols
4. Vendor & Stakeholder Management
Manage contracts and SLAs with vendors (fa...