600+ Reliability Jobs - June 2026 - Urgent Hiring

search.result_querys_job "reliability"

job_alert.title_toggle_job

Undisclosed
  • About the Job
  • Senior Platform Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining GEL’s internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security.
  • As a Senior PRE within GEL’s Infrastructure team, you will play a pivotal role in designing, building, and operating distributed container hosting solutions using Broadcom’s Tanzu product. Your mission is to safeguard and continuously enhance cloud-native applications and services that power the organization’s container ecosystem. You will serve as a Level2 support, working closely with cross-functional teams to troubleshoot complex issues, optimize platform performance, and guide application teams in adopting reliability best practices. ...
Posted
18 days ago
SGD8,000 - SGD8,000 Sebulan

Singapore

  • This is a short-term position of up to one year.
  • You provide front-line (L1) reliability and operational support across the Saudi Wealth Management platform landscape—spanning SAMA regulatory technologies (e.g., Watheeq, SARIE and ZATCA e-Invoicing),front-facing platforms such as RM Plus, core banking (Temenos T24) and payment platforms (TPH, GTX and SecPay). You balance day-to-day service stability with deep understanding of how these technologies enable Saudi business workflows, ensuring resilient operations, compliant outcomes and high-quality client service.
  • –  Monitor the health and availability of Saudi WM platforms (RM Plus, T24, TPH, GTX, SecPay and regulatory services) using dashboards and alerts to detect and respond to issues early. ...
Posted
23 days ago
Undisclosed

Singapore

  • Advance knowledge of core AWS services: EC2, ECS/EKS, Lambda, S3, RDS/Aurora, DynamoDB, VPC, ELB/ALB/NLB, Route53, IAM.
  • Designing multi-AZ and multi-region highly available architectures.
  • Strong understanding of networking in AWS (subnets, routing tables, NAT, security groups, NACLs, VPC peering, PrivateLink). ...
Posted
23 days ago
Undisclosed

KL City

  • At AIA we’ve started an exciting movement to create a healthier, more sustainable future for everyone.
  • As pioneering innovators for over 100 years, we’re now transforming our organisation to be faster, simpler and more connected. Because we want to be even better equipped to develop digital solutions and experiences that help more people live Healthier, Longer, Better Lives.
  • To get there, we need people with tech/digital/analytics expertise and passion to help develop positive, sustainable change through digitally enhanced experiences that will impact the lives of millions of people and create a healthier future for everyone. ...
Posted
19 days ago
Undisclosed

KL City

  • The Role
  • Pave Bank is building the future of programmable banking — combining traditional banking with digital assets under a single, regulated platform. We’re looking for a Site Reliability Engineer (SRE) to ensure our core systems are highly available, scalable, and performant as we grow.
  • As an SRE at Pave Bank, you’ll work closely with Engineering, Product, Security and Operations teams to build robust infrastructure, automate operations, and maintain reliability across all services. Your work will directly impact the safety, performance, and scalability of our banking platform, helping our customers trust Pave Bank with their finances. ...
Posted
19 days ago
Undisclosed

KL City

  • At Provido Global, we’re more than a technology company. We are a global hub of innovation, creativity, and engineering excellence.
  • Our teams design and deliver intelligent, secure, and high-performance digital solutions that help organizations modernize operations, scale their platforms, and succeed in an increasingly digital world.
  • We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable solutions to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any relevant issues. ...
Posted
19 days ago
Undisclosed
  • Manage and lead ATE Test Operation of the Reliability Lab consisting of Operators and Technicians.
  • Drive the team to meet On‑Time Delivery by regularly monitoring that all Reliability requests progress per committed cycle time at the Electrical Testing process, including JEDEC Test Window or equivalent requirements.
  • Provide Analog, Logic & Discrete ATE testing support for all device families tested across various Analog, Logic & Discrete testers. ...
Posted
13 days ago
Undisclosed

Singapore

  • As Singapore’s longest established bank, we have been dedicated to enabling individuals and businesses to achieve their aspirations since 1932. How? By taking the time to truly understand people. From there, we provide support, services, solutions, and career paths that meet their individual needs and desires.
  • Today, we’re on a journey of transformation. Leveraging technology and creativity to become a future-ready learning organisation. But for all that change, our strategic ambition is consistently clear and bold, which is to be Asia’s leading financial services partner for a sustainable future.
  • We invite you to build the bank of the future. Innovate the way we deliver financial services. Work in friendly, supportive teams. Build lasting value in your community. Help people grow their assets, business, and investments. Take your learning as far as you can. Or simply enjoy a vibrant, future-ready career. ...
Posted
19 days ago
Undisclosed

Singapore

  • Advance knowledge of core AWS services: EC2, ECS/EKS, Lambda, S3, RDS/Aurora, DynamoDB, VPC, ELB/ALB/NLB, Route53, IAM.
  • Designing multi-AZ and multi-region highly available architectures.
  • Strong understanding of networking in AWS (subnets, routing tables, NAT, security groups, NACLs, VPC peering, PrivateLink). ...
Posted
24 days ago
Undisclosed

Singapore

  • A strong believer of automating DevOps & SRE aspects like infrastructure provisioning, deployment, observability, incident lifecycle, uptime SLA etc.
  • Bold to challenge, open to get challenged, curious to learn & grow
  • Using InfrastructureAsCode tooling like Terraform or Ansible to manage AWS resources ...
Posted
24 days ago
Undisclosed
Kerja di Rumah

Singapore

  • The Role
  • Pave Bank is building the future of programmable banking — combining traditional banking with digital assets under a single, regulated platform. We’re looking for a Site Reliability Engineer (SRE) to ensure our core systems are highly available, scalable, and performant as we grow.
  • As an SRE at Pave Bank, you’ll work closely with Engineering, Product, Security and Operations teams to build robust infrastructure, automate operations, and maintain reliability across all services. Your work will directly impact the safety, performance, and scalability of our banking platform, helping our customers trust Pave Bank with their finances. ...
Posted
19 days ago
Undisclosed

Singapore

  • Big Scale & Startup Culture: Join a company that combines the scale of a major player with the agility and innovation of a startup
  • Exceptional Team: Collaborate with a team of former employees from tech giants like Google, Meta, ByteDance, and Microsoft, as well as ACM ICPC programming champions
  • Stock Options: As we grow and succeed together, you'll have the chance to benefit from it ...
Posted
19 days ago
Undisclosed

Singapore

  • Advance knowledge of core AWS services: EC2, ECS/EKS, Lambda, S3, RDS/Aurora, DynamoDB, VPC, ELB/ALB/NLB, Route53, IAM.
  • Designing multi-AZ and multi-region highly available architectures.
  • Strong understanding of networking in AWS (subnets, routing tables, NAT, security groups, NACLs, VPC peering, PrivateLink). ...
Posted
24 days ago
Undisclosed

Singapore

  • Design, build and maintain the software development pipeline automation with its related tool sets (e.g. JIRA, GIT/BitBucket, Jenkins, Nexus, etc.) to enable Continuous Integration (CI) and Continuous Deployment (CD).
  • Design and implement the infrastructure and operating environment for container based microservices that meet the agreed high availability, performance and security requirements.
  • Design and develop the test automation to validate the builds in the CI/CD pipeline. ...
Posted
19 days ago
Undisclosed
Kerja di Rumah

Singapore

  • We’re looking for an SRE / Reliability Engineering Intern who wants to own how real systems stay up—not just respond when they break.
  • You’ll work on defining and improving reliability practices across production systems, including monitoring, alerting, incident response, and resilience. This means operating in an environment where systems are evolving quickly, and reliability needs to be built—not inherited.
  • You’ll collaborate closely with engineering and infrastructure teams to ensure systems are observable, resilient, and continuously improving. Your work will directly impact uptime, performance, and user experience. ...
Posted
19 days ago
SGD7,000 - SGD8,500 Sebulan

Singapore

  • Excellent hands on experience in Core Java 1.8, Spring, Spring Boot, Quartz
  • Hands on experience with messaging systems such as RabbitMQ and IBM MQ
  • Strong working experience on Linux Operating Systems (Oracle Linux 7.6) ...
Posted
19 days ago
SGD7,000 - SGD8,500 Sebulan

Singapore

  • Build secure and scalable cloud networking (VPCs, subnets, routing, VPN,firewalls).
  • Support application releases and coordinate deployments across environments.
  • Implement logging/monitoring using Prometheus, Grafana, Datadog, Splunk, orCloudWatch. ...
Posted
19 days ago
SGD6,500 - SGD6,500 Sebulan

Singapore

  • POSITION OVERVIEW : Sr. Site Reliability Engineering (L7)
  • POSITION GENERAL DUTIES AND TASKS :
  • Mandatory Skills (Must-Have) ...
Posted
13 days ago
SGD9,000 - SGD9,900 Sebulan
Kerja di Rumah

Singapore

  • Own end‑to‑end reliability and performance of hybrid and cloud-connected network services.
  • Apply Network Reliability Engineering(NRE) principles to reduce operational toil and improve service resilience.
  • Design, implement, and continuously improve highly available hybrid and cloud network architectures. ...
Posted
20 days ago
MYR1,800 - MYR3,000 Sebulan

Malaysia

  • Provide guidance to the respective shifts leaders on the HC deployment by operation / product to achieve daily work plan.
  • To set up system and controls to be uniformly administer by the shifts leaders.
  • To enforce work discipline to all shift personnel (work instruction, control plans, attendance, tardiness, etc. ...
Posted
14 days ago
Undisclosed

Malacca City

  • Set up and maintain applications infrastructure, and define and run code quality checks
  • Deploy application packages along the CI/CD pipeline, including solution reviews, sprint package testing (integration, performance), UAT stages, in collaboration with IT Application Owners and Quality Engineers
  • Collaborate with Platform SRE and Architects to identify and prioritize system improvements, and develop solutions to meet business needs ...
Posted
5 days ago
Undisclosed

Jurong West

  • Develop test plans based on customer requirements per the test request form.
  • Design, review, and fabricate test fixtures and PCBs for product testing.
  • Review committed test schedules with the Lab Specialist and communicate with customers for alignment. ...
Posted
2 days ago
Undisclosed

Singapore

  • Reliability Leadership: Lead initiatives to improve system reliability, performance, and scalability.
  • Automation & Remediation: Design and implement automated workflows, deployment safety checks, and auto-remediation processes.
  • Monitoring & SLO Management: Define, enforce, and monitor SLOs/SLIs, ensuring alignment with business objectives. ...
Posted
15 days ago
Undisclosed

KL City

  • Operational Support: Deeply involve in the SRE lifecycle of game services, assisting with CI/CD pipeline optimization, automated deployments, and real-time system monitoring.
  • Architecture & Documentation: Collaborate with senior engineers to design high-availability systems and maintain technical documentation for troubleshooting and operational standards.
  • Infrastructure & Stability: Participate in the benchmarking and performance tuning of core infrastructure to ensure system robustness under high-concurrency scenarios. ...
Posted
2 days ago
Undisclosed

Singapore

  • Manage platform configuration and production changes
  • Respond to incidents, perform root cause analysis, and ensure timely resolution
  • Monitor system health and optimize performance of trading platforms ...
Posted
4 days ago
Undisclosed

Singapore

  • Manage platform configuration and production changes
  • Respond to incidents, perform root cause analysis, and ensure timely resolution
  • Monitor system health and optimize performance of trading platforms ...
Posted
4 days ago
SGD5,000 - SGD5,000 Sebulan

Singapore

  • Define and implement NAND Flash reliability test flows to assess product reliability performance and prevent extrinsic/intrinsic reliability escapes.
  • Perform high‑volume statistical data analysis to evaluate NAND DPM/ intrinsic reliability risks and trends.
  • Drive electrical failure analysis, including  advance characterization, to identify root causes and solution spaces. ...
Posted
2 days ago
Undisclosed

Singapore

  • Incident Response & RCA: Lead the response for complex virtualization, storage, or OS-level disruptions and conduct blameless post-mortems and Root Cause Analysis (RCA) to prevent systemic recurrence.
  • Systems Automation: Develop and maintain software tools (Python, PowerShell, Java) that automation if infrastructure task via, CI/CD pipelines, and to improve efficiency and reduce operational risk.
  • Observability & Telemetry: Architect and manage AI-first monitoring systems (Grafana, ELK) to capture deep telemetry for predictive failure detection across hypervisors, storage arrays, and OS performance counters. ...
Posted
8 days ago
SGD4,600 - SGD4,600 Sebulan

Singapore

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
17 days ago