About Hytech
Hytech is a leading management consulting firm headquartered in Australia and Singapore, specialising in digital transformation for fintech and financial services organisations. We deliver end-to-end consulting services and provide robust middle- and back-office solutions that enable our clients to optimise operations, enhance efficiency, and stay ahead in a fast-evolving digital landscape. Our client portfolio includes top global trading platforms and leading crypto exchanges.
With more than 2,000 professionals worldwide, Hytech has a strong and growing international presence, with offices across Australia, Singapore, Malaysia, Taiwan, the Philippines, Thailand, Morocco, Cyprus, Dubai, and beyond.
We are looking for a Senior DevOps Engineer with strong hands-on experience in Kubernetes, AWS, CI/CD, automation, monitoring, and production system reliability.
You will join a collaborative DevOps team of 9+ engineers across multiple countries, supporting and improving global distributed production systems. The role provides exposure to complex infrastructure challenges, including high availability, scalability, security, cost optimization, disaster recovery, and cross-region operations.
You will also have the opportunity to work on AWS-recognized cloud and infrastructure projects, applying modern cloud-native practices across Kubernetes, infrastructure-as-code, observability, and automation.
We are looking for someone who is smart, hands-on, proactive, and structured in problem-solving — someone who can not only operate systems, but also improve them for long-term stability, scalability, and engineering efficiency.
Key Responsibilities
Infrastructure & Kubernetes
- Design, build, operate, and improve Kubernetes-based infrastructure.
- Manage Kubernetes clusters, workloads, networking, storage, ingress, service discovery, and autoscaling.
- Improve deployment reliability, cluster stability, resource utilization, and operational visibility.
- Support containerized application deployment across different environments.
- Troubleshoot Kubernetes issues related to pods, nodes, networking, DNS, storage, and performance.
Cloud & System Operations
- Manage cloud infrastructure on platforms such as AWS, Azure, GCP, or similar.
- Maintain and improve Linux-based production environments.
- Support high-availability, low-latency, and business-critical systems.
- Perform capacity planning, cost optimization, disaster recovery planning, and infrastructure scaling.
- Ensure infrastructure is secure, observable, resilient, and easy to operate.
CI/CD & Automation
- Build and maintain CI/CD pipelines for application and infrastructure deployment.
- Automate repetitive operational tasks using scripts, tools, and infrastructure-as-code.
- Improve deployment speed, rollback capability, release safety, and developer productivity.
- Work with tools such as GitLab CI, GitHub Actions, Jenkins, Argo CD, Helm, Terraform, Ansible, or similar.
Monitoring, Reliability & Incident Response
- Build and improve monitoring, logging, alerting, and observability platforms.
- Work with tools such as Prometheus, Grafana, Loki, ELK, CloudWatch, Datadog, or similar.
- Participate in incident response, root cause analysis, and post-incident improvement.
- Identify system bottlenecks and propose long-term solutions instead of short-term fixes.
- Improve system reliability, availability, and operational readiness.
Collaboration & Engineering Support
- Work closely with developers, QA, security, system administrators, and business teams.
- Help development teams improve application deployment, configuration, performance, and reliability.
- Provide technical guidance on infrastructure design and operational best practices.
- Document architecture, operational procedures, runbooks, and troubleshooting guides.
Requirements
Must Have
- Strong hands-on experience with Kubernetes in production environments.
- Solid understanding of containers, Docker, Helm, ingress controllers, service mesh, and Kubernetes networking.
- Strong Linux system administration skills.
- Experience with CI/CD pipeline design and automation.
- Experience with cloud infrastructure such as AWS, Azure, GCP, or private cloud.
- Experience with Infrastructure as Code, such as Terraform, CloudFormation, Ansible, or similar.
- Good scripting ability using Bash, Python, Go, or similar.
- Strong troubleshooting skills across application, infrastructure, network, and system layers.
- Good understanding of monitoring, logging, alerting, and incident response.
- Able to work independently and handle production issues under pressure.
Preferred Qualifications
- Experience supporting financial, trading, payment, SaaS, or other high-availability systems.
- Experience with AWS EKS, MSK/Kafka, RDS, Redis, S3, CloudFront, Route 53, WAF, or similar services.
- Experience with GitOps tools such as Argo CD or Flux.
- Experience with service mesh such as Istio, Linkerd, or Consul.
- Experience with security hardening, secrets management, IAM, network security, and compliance.
- Experience with multi-region, hybrid cloud, or disaster recovery architecture.
- Experience optimizing infrastructure cost and resource usage.
- Experience supporting low-latency or high-throughput systems.
What We Are Looking For
We are looking for someone who is:
- Smart and logical: able to break down complex problems clearly.
- Hands-on: not only understands concepts, but can execute and fix real production issues.
- Proactive: able to identify problems before they become incidents.
- Reliable under pressure: calm and structured during production issues.
- Automation-minded: prefers repeatable systems over manual operations.
- Security-conscious: understands the importance of access control, secrets, auditability, and system hardening.
- Business-aware: understands that infrastructure decisions affect system stability, cost, delivery speed, and user experience.
- A strong communicator: able to explain technical issues clearly to both technical and non-technical stakeholders.
Nice-to-Have Technical Stack
- Kubernetes / EKS / AKS / GKE
- Docker / Helm / Kustomize
- Terraform / Ansible
- GitLab CI / GitHub Actions / Jenkins / Argo CD
- AWS / Azure / GCP
- Prometheus / Grafana / Loki / ELK / CloudWatch
- Kafka / Redis / PostgreSQL / MySQL
- Nginx / Ingress Controller / Istio
- Linux / Bash / Python / Go
- Vault / Secrets Manager / IAM / WAF