SUMMARY
This is an infrastructure engineer focused on automating the system delivery and operations lifecycle. This role provides second-line (L2/3) support for medium-complexity incidents across two or more technology domains (Cloud, Security, Networking, Apps, Collaboration) while proactively driving system reliability and reducing operational toil.
RESPONSIBILITIES
- Automation & CI/CD: Automate continuous integration, delivery, and deployment pipelines (Jenkins, GitLab CI); write automation scripts (Python, Bash) to eliminate manual operational tasks.
- Infrastructure & Configuration as Code: Manage cloud provisioning using Terraform and AWS CloudFormation; implement configuration management (Ansible, Chef, Puppet) for patching and server hardening.
- Containerization & Observability: Orchestrate application deployments via Docker, Kubernetes, and Helm; design and maintain monitoring, logging, and tracing solutions to optimize system reliability.
- Operations & Incident Support: Monitor queues and resolve L2 incidents/requests within SLAs; lead initial operational client escalations, identify root causes, and assist L1 engineers with initial triage.
- Change & Shift Management: Execute approved maintenance and log complete change requests with risk mitigation plans; complete structured shift handovers for critical tasks.
- Optimization & Auditing: Analyze ticket trends and audit logs to identify automation opportunities, update knowledge base articles, and reduce overall ticket volume.
REQUIREMENTS
- Education: Bachelor’s degree in IT/Computing or equivalent practical experience.
- Soft Skills: Adaptable, highly resilient in pressurized environments, cross-cultural communicator, and deeply client-centric.
- Work Approach: Strong proactive planning capabilities, collaborative across internal/external resolver groups, and willing to work extended hours when necessary.
Technical Tool Stack
- Cloud & Virtualization: AWS, Microsoft Azure, VMware (vSphere, vCenter, ESXi, vRA, Horizon VDI).
- DevOps & Automation: Git/GitHub/GitLab, Jenkins, Terraform, CloudFormation, Ansible, Chef, Puppet.
- Containers & Orchestration: Docker, Kubernetes (K8s), Helm, SUSE Rancher Prime, NeuVector Prime.
- Observability & Monitoring: Prometheus, Grafana, ELK Stack, Datadog, Splunk, New Relic, Solarwinds.
- Data Resilience & Storage: Veeam Backup & Replication, NetApp Storage, Hitachi Storage (HCP/HCPCS).
- Security & Systems Infrastructure: CyberArk, HashiCorp Vault, Active Directory, WSUS, Linux (RHEL, Ubuntu), OpenGear Console, OPSWAT, RSA SecurID, Tenable, Trend Micro.
Due to the nature of the projects, only Singaporean may apply. Take note that only shortlisted candidates would be notified.