600+ Reliability Jobs - June 2026 - Urgent Hiring

Paparan 657 hasil carian kerja kosong untuk "reliability"

Jangan lepaskan peluang untuk kerja Reliability terkini!

Undisclosed

Singapore

  • Reliability Leadership: Lead initiatives to improve system reliability, performance, and scalability.
  • Automation & Remediation: Design and implement automated workflows, deployment safety checks, and auto-remediation processes.
  • Monitoring & SLO Management: Define, enforce, and monitor SLOs/SLIs, ensuring alignment with business objectives. ...
Posted
17 days ago
Undisclosed

Singapore

  • Build, expand, and operate ByteDance’s global infrastructures, including large-scale systems in public and private clouds, data centers, and content delivery networks.
  • Build tools, automation, visualizations, and monitors to facilitate the operation and optimization of the global infrastructure.
  • Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues. ...
Posted
11 days ago
Undisclosed

Jurong West

  • Develop test plans based on customer requirements per the test request form.
  • Design, review, and fabricate test fixtures and PCBs for product testing.
  • Review committed test schedules with the Lab Specialist and communicate with customers for alignment. ...
Posted
11 days ago
Undisclosed

Singapore

  • Design, build and maintain highly available, scalable, and resilient production systems.
  • Define and implement service reliability standards, including SLIs, SLOs and operational best practices.
  • Lead incident response, root cause analysis, post-incident reviews, and reliability improvement initiatives. ...
Posted
12 days ago
SGD4,600 - SGD4,600 Sebulan

Singapore

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
18 days ago
Undisclosed

Singapore

  • Comply with all RSTO’s Safety, Health & Environmental (SHE) requirements, never put oneself and others at safety & health risks, and report any workplace accidents, near misses and hazards as soon as practicable
  • Observe all RSTO’s site security measures at all times, and report any suspicious characters/objects & damaged security mechanisms to Site Security immediately
  • Support a strong Engineering and Maintenance Safety Culture that encourages taking the right amount of time to perform each task safely ...
Posted
12 days ago
Undisclosed

Singapore

  • Provide 2nd level support for production systems and critical business applications.
  • Investigate, troubleshoot, and resolve incidents and performance issues. Perform root cause analysis (RCA) and document findings in a structured manner.
  • Collaborate closely with development teams to ensure sustainable issue resolution. ...
Posted
18 days ago
Undisclosed

Singapore

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
19 days ago
Undisclosed

Jurong West

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
19 days ago
SGD7,000 - SGD7,000 Sebulan

Singapore

  • Description:
  • Must have• Excellent hands on experience in Core Java 1.8, Spring, Spring Boot, Quartz• Hands on experience with messaging systems such as RabbitMQ and IBM MQ• Strong working experience on Linux Operating Systems (Oracle Linux 7.6)• Experience with Application Servers, preferably IBM WebSphere / Apache Tomcat8.5.x• Excellent and proven experience in Oracle SQL and PL/SQLGood to have• Experience with monitoring tools such as Tivoli, and Splunk• Prior experience in payments processing systems or the banking/financialservices industry• Experience with Shell scripting• Understanding in supporting large, complex, high availability, high volumeapplications• Understanding of failover mechanisms and disaster recoveryPOSITION OVERVIEW : Software Development SpecialistPOSITION GENERAL DUTIES AND TASKS :Role SummaryThe Senior Site Reliability Engineer (L7) is a hands on technicalengineering role responsible for building, automating, scaling, and maintaininghighly reliable, secure, and resilient cloud and hybrid infrastructureplatforms.The role focuses on infrastructure engineering, container orchestration,Infrastructure as Code (IaC), observability, incident response, and platformlevel application development.Key ResponsibilitiesGood to Have : Deploy, configure, and maintain AWS resources including EC2,ECS, EKS, VPC, IAM, NAT, and networking components.• Good to Have : Build secure and scalable cloud networking (VPCs, subnets,routing, VPN, firewalls).• Work with load balancers, reverse proxies, API gateways, DNS management, andnetwork routing.• Build CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions.• Support application releases and coordinate deployments across environments.• Implement logging/monitoring using Prometheus, Grafana, Datadog, Splunk, orCloudWatch.• Participate in incident response, troubleshooting, on-call rotation, andpost-incident RCA.• Perform system performance tuning, patching, capacity planning, andoptimization.• Improve system reliability through automation, redundancy, and engineeringbest practices.• Implement and maintain IaC using Terraform or CloudFormation.• Automate provisioning, configuration, and environment setup using scripting(Python, Bash, Go).• Develop reusable automation modules, templates, pipelines, and cloudengineering patterns.• Build, deploy, and manage containerized applications using Docker.• Operate and optimize Kubernetes clusters (EKS or on prem).• Implement autoscaling, service mesh, pod security, and workload monitoring.• Develop automation services, internal tooling, and platform utilities usingCore Java, Spring Boot, Quartz, and Erlang.• Build wrappers/services for IBM MQ and RabbitMQ messaging flows.• Create schedulers, orchestration components, and internal micro services foroperational tasks.• Write integrations, connectors, and event-driven components for infraautomation.• Build custom alerts, webhook handlers, log processors, and reliabilitytooling.Technical Universities skill: Area Technologies / Tools:Operating Systems & Virtualization Enterprise Linux, VMware, OVM, X86server clustersContainerization & Orchestration Kubernetes, DockerApplication Development (Platform) Core Java1.8, Spring, Spring Boot, Quartz,ErlangMessaging Platforms IBM MQ, RabbitMQ, Erlang/MnesiaIaC & Automation Terraform, Ansible, CloudFormation, ChefScripting Languages Python, Go, BashCI/CD Tooling Jenkins, GitLab CI, GitHub ActionsObservability & Logging Prometheus, Grafana, Datadog, SplunkDatabases & Storage Oracle, HA DB clusters, NFS, HPE Nimble, DataDomainLoad Balancing & Networking F5 LTM/ASM/ASR, DNS, network routing, proxiesFile Transfer & Directory Services GoAnywhere, Tivoli Directory ServerCloud Platforms AWS, Azure, GCPSecurity Technologies Hardware Security Modules (Payshield or equivalent)Experience Requirements• 5+ years of experience as an SRE, DevOps Engineer, Cloud Engineer, orPlatform Engineer.• Strong hands on expertise with AWS cloud services (EC2, ECS, EKS).• Practical experience with IaC tools such as Terraform and CloudFormation.• Deep working knowledge of Kubernetes, Docker, cloud networking, loadbalancers, and proxies.• Hands on experience with CI/CD pipelines, release engineering, observabilitytooling, and monitoring stacks.• Experience supporting databases including partitioning, replication,sharding, and high availability setups.• Prior involvement in incident response, production support, and reliabilityengineering practices.Desirable• Good Understanding on infrastructure, F5, network• knowledge of ISO20022, ISO8583, and Swift MT formats• Experience in shell scripting, Python.• Experience within payments processing systems or finance/banking industry.• Experience in supporting applications using different languages and/orcharacter sets.
Posted
23 days ago
Undisclosed

KL City

  • Own and lead the Incident Management process end‑to‑end, ensuring rapid and effective restoration of services.
  • Act as Incident Commander for Major Incidents (P0/P1/P2), providing leadership, prioritization, and decision‑making authority.
  • Coordinate cross‑functional response across Infrastructure, Applications, IT Security, Vendors, and Business stakeholders. ...
Posted
13 days ago
SGD7,000 - SGD7,000 Sebulan

Singapore

  • POSITION OVERVIEW : Software Development Senior SpecialistPOSITION GENERAL DUTIES AND TASKS :Role SummaryThe Senior Site Reliability Engineer (L7) is a hands‑on technical engineeringrole responsible for building, automating, scaling, and maintaining highlyreliable, secure, and resilient cloud and hybrid infrastructure platforms.The role focuses on cloud infrastructure engineering, container orchestration,Infrastructure‑as‑Code (IaC), observability, incident response, and platform‑levelapplication development.Key ResponsibilitiesDeploy, configure, and maintain AWS resources including EC2, ECS, EKS, VPC,IAM, NAT, and networking components.• Build secure and scalable cloud networking (VPCs, subnets, routing, VPN,firewalls).• Work with load balancers, reverse proxies, API gateways, DNS management, andnetwork routing.• Build CI/CD pipelines using Jenkins, GitLab CI, or GitHubActions.• Support application releases and coordinate deployments across environments.• Implement logging/monitoring using Prometheus, Grafana, Datadog, Splunk, orCloudWatch.• Participate in incident response, troubleshooting, on-call rotation, andpost-incident RCA.• Perform system performance tuning, patching, capacity planning, andoptimization.• Improve system reliability through automation, redundancy, and engineeringbest practices.• Implement and maintain IaC using Terraform or CloudFormation.• Automate provisioning, configuration, and environment setup using scripting(Python, Bash, Go).• Develop reusable automation modules, templates, pipelines, and cloudengineering patterns.• Build, deploy, and manage containerized applications using Docker.• Operate and optimize Kubernetes clusters (EKS or on‑prem).• Implement autoscaling, service mesh, pod security, and workload monitoring.• Develop automation services, internal tooling, and platform utilities usingCore Java, Spring Boot, Quartz, and Erlang.• Build wrappers/services for IBM MQ and RabbitMQ messaging flows.• Create schedulers, orchestration components, and internal micro‑services foroperational tasks.• Write integrations, connectors, and event-driven components forinfra-automation.• Build custom alerts, webhook handlers, log processors, and reliabilitytooling.Technologies / Tools:Operating Systems & VirtualizationEnterprise Linux, VMware, OVM, X86 server clustersContainerization & OrchestrationKubernetes, DockerApplication Development (Platform)Core Java1.8, Spring, Spring Boot, Quartz, ErlangMessaging PlatformsIBM MQ, RabbitMQ, Erlang/MnesiaIaC & AutomationTerraform, Ansible, CloudFormation, ChefScripting LanguagesPython, Go, BashCI/CD ToolingJenkins, GitLab CI, GitHub ActionsObservability & LoggingPrometheus, Grafana, Datadog, SplunkDatabases & StorageOracle, HA DB clusters, NFS, HPE Nimble, DataDomainLoad Balancing & NetworkingF5 LTM/ASM/ASR, DNS, network routing, proxiesFile Transfer & Directory ServicesGoAnywhere, Tivoli Directory ServerCloud PlatformsAWS, Azure, GCPSecurity TechnologiesHardware Security Modules (Payshield or equivalent)Experience Requirements:5+ years of experience as an SRE, DevOps Engineer, Cloud Engineer, or PlatformEngineer.Strong hands‑on expertise with AWS cloud services (EC2, ECS, EKS).Practical experience with IaC tools such as Terraform and CloudFormation.Deep working knowledge of Kubernetes, Docker, cloud networking, load balancers,and proxies.Hands‑on experience with CI/CD pipelines, release engineering, observabilitytooling, and monitoring stacks.Experience supporting databases including partitioning, replication, sharding,and high availability setups.Prior involvement in incident response, production support, and reliabilityengineering practices.Desirable:Good Understanding on infrastructure, F5, networkknowledge of ISO20022, ISO8583, and Swift MT formatsExperience in shell scripting, Python.Experience within payments processing systems or finance/banking industry.Experience in supporting applications using different languages and/orcharacter sets.
Posted
23 days ago
SGD6,000 - SGD6,000 Sebulan

Singapore

  • Infrastructure & Automation: Design, implement, and maintain scalable cloud infrastructure using Infrastructure as Code (IaC) tools.
  • CI/CD Pipelines: Build and optimize automated pipelines for testing, deployment, and release management.
  • Monitoring & Reliability: Establish observability standards, implement monitoring, logging, and alerting systems to ensure system health. ...
Posted
19 days ago
Undisclosed

Singapore

  • 5-10 years of experience in Network Engineering within operations or design, including at least 3 years supporting e commerce, financial services, or large scale SaaS platforms.
  • Expert-level understanding of TCP/IP networking, including LAN and WAN architectures. Experience with private MPLS networks and SD WAN technologies is highly desirable.
  • Advanced proficiency or formal certifications in Cisco, Arista, and/or Check Point network platforms. ...
Posted
13 days ago

Centre For Strategic Infocomm Technologies (CSIT)

Undisclosed

Singapore

  • Plan, design and modernise complex network infrastructure to achieve optimal network performance to ensure scalable, sustainable and secure operations
  • Define architectural standards, reference models and guidelines that enhance security, resiliency and scalability
  • Drive actionable insights with good telemetry data to ensure meeting SLA ...
Posted
24 days ago
Undisclosed
  • Job Summary:
  • Collaborate closely with Manufacturing, Quality, Product Engineering, and Suppliers to ensure robust, compliant, and timely change
  • Requirements:
Posted
19 days ago
Undisclosed

Singapore

  • You will improve site reliability by building mechanisms/architectures that enable fault tolerance and faster median time to respond and median time to detect.
  • You will drive the integration of observability automation into the CI/CD pipeline.
  • You will handle production incidents, manage incident communication with clients and draft root cause analysis documents. ...
Posted
24 days ago
Undisclosed

Singapore

  • Linux
  • Chef, Puppet or Ansible
  • Kubernetes, Docker, or Podman ...
Posted
4 days ago
Undisclosed

Singapore

  • Responsible for daily operations, hardware/software troubleshooting, and optimization of GPU/CPU computing infrastructure to enhance resource efficiency and service reliability.
  • Manage and operate Kubernetes clusters and ML platforms, including monitoring/alerting, version upgrades, disaster recovery optimization, and security drills to ensure system high availability and maintainability.
  • Drive automation of operational workflows covering resource management, change control, self-healing solutions, and user tools. ...
Posted
2 days ago
Undisclosed

Singapore

  • Basic understanding of Linux/Unix operating systems
  • Knowledge of computer networking fundamentals and system administration
  • Familiarity with programming or scripting languages (Python, Bash, Go, or similar) ...
Posted
4 days ago
SGD9,000 - SGD9,000 Sebulan

Singapore

  • The SRE Engineer will play a key role in the deployment and modernization of Cloud and Edge Computing solutions to ensure the resilience and efficiency of industrial operations. The position is part of an international team of 15people responsible for ensuring the service quality of software solutions supporting industrial activities across factories worldwide.
  • Responsibilities:1. Design, deploy, and optimize Cloud and Edge Computing infrastructures to ensure performance and resilience.
  • Automate the deployment and management of industrial IT systems using Infrastructure as Code tools (Terraform, Ansible). ...
Posted
2 days ago
Undisclosed

Singapore

  • End-to-end automation project performance monitoring — own regional rollouts across 8 markets, translate strategy into in-market execution with clear owners, timelines, and accountability; ensure initiatives don't just launch but stick.
  • Performance visibility tools — work with local automation engineers and China-based software engineers to deliver OEE dashboards, digital twin, and AI-assisted diagnostics that operations teams rely on day-to-day.
  • Vendor governance — partner with project and procurement teams to validate point-to-point testing, redundancy verification, and control logic against specifications during design and project phases; identify gaps before go-live. ...
Posted
4 days ago
Undisclosed

Singapore

  • Linux
  • Chef, Puppet or Ansible
  • Kubernetes, Docker, or Podman ...
Posted
6 days ago
SGD6,500 - SGD6,500 Sebulan

Singapore

  • Provide leadership and insights into root cause of device failures via in-depth failure analysis using intricate tools and state of the art equipment.
  • Collaborate with product engineering and R&D team to translate findings into solutions for new design improvements.
  • Monitor and use big data analysis on long term wafer reliability performance and propose improvements to product design or manufacturing process changes. ...
Posted
15 days ago
Undisclosed

Sembawang

  • Provide leadership and insights into root cause of device failures via in-depth failure analysis using intricate tools and state of the art equipment.
  • Collaborate with product engineering and R&D team to translate findings into solutions for new design improvements.
  • Monitor and use big data analysis on long term wafer reliability performance and propose improvements to product design or manufacturing process changes.
Posted
15 days ago
Undisclosed

KL City

  • Develop and implement preventive and predictive maintenance programs for onshore mechanical equipment.
  • Utilize CMMS to plan, schedule, and monitor maintenance activities and equipment performance.
  • Analyze equipment reliability data and recommend improvements to reduce downtime and maintenance costs. ...
Posted
21 days ago
Undisclosed
  • Develop and implement preventive and predictive maintenance programs for onshore mechanical equipment.
  • Utilize CMMS to plan, schedule, and monitor maintenance activities and equipment performance.
  • Analyze equipment reliability data and recommend improvements to reduce downtime and maintenance costs. ...
Posted
21 days ago
Undisclosed

Singapore

  • Linux
  • Chef, Puppet or Ansible
  • Kubernetes, Docker, or Podman ...
Posted
9 days ago