600+ Reliability Jobs - June 2026 - Urgent Hiring

Paparan 673 hasil carian kerja kosong untuk "reliability"

Jangan lepaskan peluang untuk kerja Reliability terkini!

Undisclosed

Singapore

  • Supporting critical Fixed Income & Commodities platforms including trade booking and risk calculation
  • Troubleshooting issues across the full stack — software, hardware, application, and network
  • Partnering with dev/engineering teams to build and improve trading infrastructure ...
Posted
3 days ago
Undisclosed

Singapore

  • 5-10 years of experience in Network Engineering within operations or design, including at least 3 years supporting e commerce, financial services, or large scale SaaS platforms.
  • Expert-level understanding of TCP/IP networking, including LAN and WAN architectures. Experience with private MPLS networks and SD WAN technologies is highly desirable.
  • Advanced proficiency or formal certifications in Cisco, Arista, and/or Check Point network platforms. ...
Posted
19 days ago
Undisclosed

Singapore

  • Linux
  • Chef, Puppet or Ansible
  • Kubernetes, Docker, or Podman ...
Posted
14 days ago
Undisclosed

George Town

  • Rights Reserved By Wazeer Khan LLC in Penang, Malaysia is seeking a Reliability Engineer to collaborate with design teams and oversee automotive burn in operations. The ideal candidate will have a BS or MS in Electrical Engineering and 8-10 years of experience in Reliability Engineering, focusing on semiconductor device reliability tests.
  • Responsibilities include debugging, analyzing reliability test results, and managing documentation. Excellent communication skills and experience with TIBCO Spotfire are essential. Candidates with knowledge of high voltage products will be preferred.
Posted
2 days ago
SGD6,000 - SGD6,000 Sebulan

Singapore

  • Manage and improve the reliability, availability, and operational excellence of the SHIP-HATS platform
  • Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
  • Lead incident management, troubleshooting, root cause analysis, and post-mortem reviews ...
Posted
4 days ago
Undisclosed

Singapore

  • Define and execute the long-term strategy for network automation, reliability, and observability across global data centers, cloud, and hybrid environments.
  • Drive the transformation of network operations through automation, resilience engineering, and continuous improvement.
  • Partner with senior stakeholders across Technology Group (TG) to align network strategy with enterprise architecture and digital transformation initiatives. ...
Posted
6 days ago
SGD9,000 - SGD9,900 Sebulan
Kerja di Rumah

Singapore

  • Own end‑to‑end reliability and performance of hybrid and cloud-connected network services.
  • Apply Network Reliability Engineering(NRE) principles to reduce operational toil and improve service resilience.
  • Design, implement, and continuously improve highly available hybrid and cloud network architectures. ...
Posted
a month ago
Undisclosed

Singapore

  • Ensure the reliability of all TikTok's major data warehouse products, services, and query engines, such as ClickHouse, Spark, Presto, Doris, etc.
  • Ensure that all service level objectives and agreements from ByteDance's Data Platform services are met; respond promptly to any system outages or issues.
  • Analyze service performance and reliability patterns to identify potential performance bottlenecks. Implement proactive measures to prevent service disruptions. Work with development teams to optimize application performance, ensuring that services run efficiently and that resources are utilized effectively. ...
Posted
15 days ago
SGD4,600 - SGD4,600 Sebulan

Singapore

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
24 days ago
Undisclosed

Singapore

  • Provide 2nd level support for production systems and critical business applications.
  • Investigate, troubleshoot, and resolve incidents and performance issues. Perform root cause analysis (RCA) and document findings in a structured manner.
  • Collaborate closely with development teams to ensure sustainable issue resolution. ...
Posted
24 days ago
SGD5,000 - SGD5,000 Sebulan

Singapore

  • Reliability Test Flow Development: Define and develop reliability stress, test flows and test plans to cover all Product Reliability aspects.
  • Reliability Test Plan and Execution: Work closely with Global Quality team to define qualification plan, review and manage the qualification execution progress, and drive for qualification gating issue resolution.
  • DPM reduction: Drive down and achieving Extrinsic Reliability (Time 0 and Field DPM) and Intrinsic Reliability through Optimized Manufacturing Test Flows and Test Strategy to meet critical KPI’s Quality, Cost and Cycle Time ...
Posted
9 days ago
Undisclosed

Singapore

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
24 days ago
Undisclosed

Jurong West

  • Support asset reliability and maintenance optimization across power generation assets (GT, HRSG, BOP, BESS) and water assets (wastewater treatment, effluent recovery, NEWater)
  • Perform reliability analysis (data analysis, RCA, FMEA) to identify failure modes, root causes, and improvement opportunities
  • Analyze equipment performance, failure history, and plant data (including predictive monitoring tools) to detect degradation trends and prevent failures ...
Posted
24 days ago
SGD6,500 - SGD6,500 Sebulan

Singapore

  • Provide leadership and insights into root cause of device failures via in-depth failure analysis using intricate tools and state of the art equipment.
  • Collaborate with product engineering and R&D team to translate findings into solutions for new design improvements.
  • Monitor and use big data analysis on long term wafer reliability performance and propose improvements to product design or manufacturing process changes. ...
Posted
21 days ago
Undisclosed

Sembawang

  • Provide leadership and insights into root cause of device failures via in-depth failure analysis using intricate tools and state of the art equipment.
  • Collaborate with product engineering and R&D team to translate findings into solutions for new design improvements.
  • Monitor and use big data analysis on long term wafer reliability performance and propose improvements to product design or manufacturing process changes.
Posted
21 days ago
SGD6,000 - SGD6,000 Sebulan

Singapore

  • Infrastructure & Automation: Design, implement, and maintain scalable cloud infrastructure using Infrastructure as Code (IaC) tools.
  • CI/CD Pipelines: Build and optimize automated pipelines for testing, deployment, and release management.
  • Monitoring & Reliability: Establish observability standards, implement monitoring, logging, and alerting systems to ensure system health. ...
Posted
25 days ago
Undisclosed

Singapore

  • Provide machinery engineering support to operations, including troubleshooting equipment issues and identifying root causes of failures
  • Evaluate and review engineering work performed by contractors and third parties to ensure compliance with company specifications, standards, and regulatory requirements
  • Perform and lead Root Cause Failure Analysis (RCFA) and contribute to reliability improvement initiatives (e.g. bad actor programmes) ...
Posted
17 days ago
Undisclosed
  • Job Summary:
  • Collaborate closely with Manufacturing, Quality, Product Engineering, and Suppliers to ensure robust, compliant, and timely change
  • Requirements:
Posted
25 days ago
Undisclosed

KL City

  • Lead by example for site reliability engineering execution and manage stages from ideation and development to launch and ongoing maintenance, ensuring timely delivery and high-quality technology infrastructure that meet customer needs, and incorporating feedback loops for iterative improvements.
  • Identify and assist on the stability, scalability, cost optimisation, and security of the technology infrastructure by continuously assessing, upgrading, and implementing best practices, using specific frameworks or standards to support business growth and protect sensitive data.
  • Manage high-performing site reliability engineering teams by recruiting top talent, providing mentorship, and fostering professional growth, with a focus on long-term team development to create a collaborative and innovative work culture. ...
Posted
17 days ago
SGD7,000 - SGD7,000 Sebulan

Singapore

  • Description:
  • Must have• Excellent hands on experience in Core Java 1.8, Spring, Spring Boot, Quartz• Hands on experience with messaging systems such as RabbitMQ and IBM MQ• Strong working experience on Linux Operating Systems (Oracle Linux 7.6)• Experience with Application Servers, preferably IBM WebSphere / Apache Tomcat8.5.x• Excellent and proven experience in Oracle SQL and PL/SQLGood to have• Experience with monitoring tools such as Tivoli, and Splunk• Prior experience in payments processing systems or the banking/financialservices industry• Experience with Shell scripting• Understanding in supporting large, complex, high availability, high volumeapplications• Understanding of failover mechanisms and disaster recoveryPOSITION OVERVIEW : Software Development SpecialistPOSITION GENERAL DUTIES AND TASKS :Role SummaryThe Senior Site Reliability Engineer (L7) is a hands on technicalengineering role responsible for building, automating, scaling, and maintaininghighly reliable, secure, and resilient cloud and hybrid infrastructureplatforms.The role focuses on infrastructure engineering, container orchestration,Infrastructure as Code (IaC), observability, incident response, and platformlevel application development.Key ResponsibilitiesGood to Have : Deploy, configure, and maintain AWS resources including EC2,ECS, EKS, VPC, IAM, NAT, and networking components.• Good to Have : Build secure and scalable cloud networking (VPCs, subnets,routing, VPN, firewalls).• Work with load balancers, reverse proxies, API gateways, DNS management, andnetwork routing.• Build CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions.• Support application releases and coordinate deployments across environments.• Implement logging/monitoring using Prometheus, Grafana, Datadog, Splunk, orCloudWatch.• Participate in incident response, troubleshooting, on-call rotation, andpost-incident RCA.• Perform system performance tuning, patching, capacity planning, andoptimization.• Improve system reliability through automation, redundancy, and engineeringbest practices.• Implement and maintain IaC using Terraform or CloudFormation.• Automate provisioning, configuration, and environment setup using scripting(Python, Bash, Go).• Develop reusable automation modules, templates, pipelines, and cloudengineering patterns.• Build, deploy, and manage containerized applications using Docker.• Operate and optimize Kubernetes clusters (EKS or on prem).• Implement autoscaling, service mesh, pod security, and workload monitoring.• Develop automation services, internal tooling, and platform utilities usingCore Java, Spring Boot, Quartz, and Erlang.• Build wrappers/services for IBM MQ and RabbitMQ messaging flows.• Create schedulers, orchestration components, and internal micro services foroperational tasks.• Write integrations, connectors, and event-driven components for infraautomation.• Build custom alerts, webhook handlers, log processors, and reliabilitytooling.Technical Universities skill: Area Technologies / Tools:Operating Systems & Virtualization Enterprise Linux, VMware, OVM, X86server clustersContainerization & Orchestration Kubernetes, DockerApplication Development (Platform) Core Java1.8, Spring, Spring Boot, Quartz,ErlangMessaging Platforms IBM MQ, RabbitMQ, Erlang/MnesiaIaC & Automation Terraform, Ansible, CloudFormation, ChefScripting Languages Python, Go, BashCI/CD Tooling Jenkins, GitLab CI, GitHub ActionsObservability & Logging Prometheus, Grafana, Datadog, SplunkDatabases & Storage Oracle, HA DB clusters, NFS, HPE Nimble, DataDomainLoad Balancing & Networking F5 LTM/ASM/ASR, DNS, network routing, proxiesFile Transfer & Directory Services GoAnywhere, Tivoli Directory ServerCloud Platforms AWS, Azure, GCPSecurity Technologies Hardware Security Modules (Payshield or equivalent)Experience Requirements• 5+ years of experience as an SRE, DevOps Engineer, Cloud Engineer, orPlatform Engineer.• Strong hands on expertise with AWS cloud services (EC2, ECS, EKS).• Practical experience with IaC tools such as Terraform and CloudFormation.• Deep working knowledge of Kubernetes, Docker, cloud networking, loadbalancers, and proxies.• Hands on experience with CI/CD pipelines, release engineering, observabilitytooling, and monitoring stacks.• Experience supporting databases including partitioning, replication,sharding, and high availability setups.• Prior involvement in incident response, production support, and reliabilityengineering practices.Desirable• Good Understanding on infrastructure, F5, network• knowledge of ISO20022, ISO8583, and Swift MT formats• Experience in shell scripting, Python.• Experience within payments processing systems or finance/banking industry.• Experience in supporting applications using different languages and/orcharacter sets.
Posted
a month ago
SGD7,000 - SGD7,000 Sebulan

Singapore

  • POSITION OVERVIEW : Software Development Senior SpecialistPOSITION GENERAL DUTIES AND TASKS :Role SummaryThe Senior Site Reliability Engineer (L7) is a hands‑on technical engineeringrole responsible for building, automating, scaling, and maintaining highlyreliable, secure, and resilient cloud and hybrid infrastructure platforms.The role focuses on cloud infrastructure engineering, container orchestration,Infrastructure‑as‑Code (IaC), observability, incident response, and platform‑levelapplication development.Key ResponsibilitiesDeploy, configure, and maintain AWS resources including EC2, ECS, EKS, VPC,IAM, NAT, and networking components.• Build secure and scalable cloud networking (VPCs, subnets, routing, VPN,firewalls).• Work with load balancers, reverse proxies, API gateways, DNS management, andnetwork routing.• Build CI/CD pipelines using Jenkins, GitLab CI, or GitHubActions.• Support application releases and coordinate deployments across environments.• Implement logging/monitoring using Prometheus, Grafana, Datadog, Splunk, orCloudWatch.• Participate in incident response, troubleshooting, on-call rotation, andpost-incident RCA.• Perform system performance tuning, patching, capacity planning, andoptimization.• Improve system reliability through automation, redundancy, and engineeringbest practices.• Implement and maintain IaC using Terraform or CloudFormation.• Automate provisioning, configuration, and environment setup using scripting(Python, Bash, Go).• Develop reusable automation modules, templates, pipelines, and cloudengineering patterns.• Build, deploy, and manage containerized applications using Docker.• Operate and optimize Kubernetes clusters (EKS or on‑prem).• Implement autoscaling, service mesh, pod security, and workload monitoring.• Develop automation services, internal tooling, and platform utilities usingCore Java, Spring Boot, Quartz, and Erlang.• Build wrappers/services for IBM MQ and RabbitMQ messaging flows.• Create schedulers, orchestration components, and internal micro‑services foroperational tasks.• Write integrations, connectors, and event-driven components forinfra-automation.• Build custom alerts, webhook handlers, log processors, and reliabilitytooling.Technologies / Tools:Operating Systems & VirtualizationEnterprise Linux, VMware, OVM, X86 server clustersContainerization & OrchestrationKubernetes, DockerApplication Development (Platform)Core Java1.8, Spring, Spring Boot, Quartz, ErlangMessaging PlatformsIBM MQ, RabbitMQ, Erlang/MnesiaIaC & AutomationTerraform, Ansible, CloudFormation, ChefScripting LanguagesPython, Go, BashCI/CD ToolingJenkins, GitLab CI, GitHub ActionsObservability & LoggingPrometheus, Grafana, Datadog, SplunkDatabases & StorageOracle, HA DB clusters, NFS, HPE Nimble, DataDomainLoad Balancing & NetworkingF5 LTM/ASM/ASR, DNS, network routing, proxiesFile Transfer & Directory ServicesGoAnywhere, Tivoli Directory ServerCloud PlatformsAWS, Azure, GCPSecurity TechnologiesHardware Security Modules (Payshield or equivalent)Experience Requirements:5+ years of experience as an SRE, DevOps Engineer, Cloud Engineer, or PlatformEngineer.Strong hands‑on expertise with AWS cloud services (EC2, ECS, EKS).Practical experience with IaC tools such as Terraform and CloudFormation.Deep working knowledge of Kubernetes, Docker, cloud networking, load balancers,and proxies.Hands‑on experience with CI/CD pipelines, release engineering, observabilitytooling, and monitoring stacks.Experience supporting databases including partitioning, replication, sharding,and high availability setups.Prior involvement in incident response, production support, and reliabilityengineering practices.Desirable:Good Understanding on infrastructure, F5, networkknowledge of ISO20022, ISO8583, and Swift MT formatsExperience in shell scripting, Python.Experience within payments processing systems or finance/banking industry.Experience in supporting applications using different languages and/orcharacter sets.
Posted
a month ago
Undisclosed

Singapore

  • Maintain the security posture and system hardening for both on-premises endpoints and cloud IT infrastructure (e.g. AWS, Azure), including the management of Government security monitoring services such as the Government Cyber Security Operations Centre (GCSOC) and Automated Baseline Log Review (ABLR).
  • Maintain and support IT endpoint and infrastructure security tools, including Microsoft Defender, Secure Service Edge, firewalls, and related technologies.
  • Ensure that critical platforms are reliable, scalable, observable, and maintainable; this includes implementing monitoring and observability tools, and leading incident response and root cause analysis to minimise downtime and prevent recurrence. ...
Posted
15 days ago
Undisclosed

Singapore

  • Design and build systems that adjust on the fly to infrastructure changes, data center moves, and global events
  • Create smart traffic management that can handle viral video surges without breaking a sweat
  • Build tools to spot and fix issues before they reach users ...
Posted
a day ago
SGD9,500 - SGD9,500 Sebulan

Singapore

  • Reliability Test Program Coding: Writing and debugging test programs for HBM device and package qualification and working with other teams to ensure full test coverage guarantees that all aspects of the product are thoroughly tested and any potential issues are identified and addressed.
  • Root Cause Understanding and Resolution of reliability test program related issues
  • Promotion of Innovation and Challenge Status Quo: The role involves promoting innovation and driving for changes (eg: innovate new reliability features and solutions, or code infrastructure optimization & handling) that will provide Micron with a technical advantage over its competition. This is vital for maintaining Micron’s competitive edge in the market. ...
Posted
17 hours ago
Undisclosed

Singapore

  • Engage in and improve the whole lifecycle of Ads systems — from system design consulting through to launch reviews, deployment, operation and refinement.
  • Build availability of services deployed across multiple data centers globally.
  • Deliver tools/software to improve the reliability, scalability and operability of services. ...
Posted
3 days ago

Centre For Strategic Infocomm Technologies (CSIT)

Undisclosed

Singapore

  • Plan, design and modernise complex network infrastructure to achieve optimal network performance to ensure scalable, sustainable and secure operations
  • Define architectural standards, reference models and guidelines that enhance security, resiliency and scalability
  • Drive actionable insights with good telemetry data to ensure meeting SLA ...
Posted
a month ago
Undisclosed

Singapore

  • You will improve site reliability by building mechanisms/architectures that enable fault tolerance and faster median time to respond and median time to detect.
  • You will drive the integration of observability automation into the CI/CD pipeline.
  • You will handle production incidents, manage incident communication with clients and draft root cause analysis documents. ...
Posted
a month ago
SGD6,000 - SGD6,000 Sebulan

Singapore

  • Reliability Test Flow Development: Define and develop reliability stress, test flows and test plans to cover all Product Reliability aspects.
  • Reliability Test Plan and Execution: Work closely with Global Quality team to define qualification plan, review and manage the qualification execution progress, and drive for qualification gating issue resolution.
  • DPM reduction: Drive down and achieving Extrinsic Reliability (Time 0 and Field DPM) and Intrinsic Reliability through Optimized Manufacturing Test Flows and Test Strategy to meet critical KPI’s Quality, Cost and Cycle Time ...
Posted
4 days ago