Kerja Sepenuh Masa, Site Reliability Engineer di Elliott Moss Consulting

Site Reliability Engineer

Elliott Moss Consulting

Undisclosed

Sepenuh Masa

Singapore

Kongsi

Simpan

Lokasi Kerja

Singapore

Penerangan Kerja

Tanggungjawab

We are looking for a skilled Site Reliability Engineer (SRE) to drive the reliability, scalability, and performance of our AWS-based service desk platform. This role will own the end-to-end AWS cloud infrastructure and DevOps pipelines, focusing on automation, system resilience, and operational excellence. The ideal candidate will treat operations as a software problem, minimizing manual intervention and ensuring a seamless experience for both agents and customers.

Key Responsibilities

1. AWS Connect & Service Desk Reliability

Design, deploy, and maintain the Amazon Connect ecosystem, including Contact Flows, Lambda integrations, and Lex bots using Infrastructure as Code (Terraform/CloudFormation).
Ensure high availability and performance of voice and chat channels with minimal latency and optimal audio quality.
Manage integrations between Amazon Connect and ITSM tools such as ServiceNow, Jira Service Management, or Salesforce.
Perform proactive capacity planning to handle peak traffic, including telephony quotas and concurrent workloads.

2. Cloud Infrastructure & Security

Manage and optimize core AWS services including EC2, ECS/EKS, S3, Lambda, DynamoDB, and VPC networking.
Implement security best practices, including IAM least-privilege access, encryption (KMS), and compliance with standards such as SOC2, HIPAA, or PCI-DSS.
Monitor and optimize cloud costs through effective FinOps practices.

3. DevOps & CI/CD Engineering

Build and maintain CI/CD pipelines using tools such as GitLab CI, GitHub Actions, Jenkins, or AWS CodePipeline.
Automate deployments for infrastructure, Lambda functions, and conversational bots.
Integrate automated testing to validate workflows, APIs, and contact flows prior to production release.
Ensure consistency across environments (Sandbox, Staging, Production) through standardized deployment patterns.

4. Observability & Incident Management

Develop monitoring dashboards and alerts using CloudWatch, X-Ray, and tools like Grafana, Datadog, or Splunk.
Lead incident response and troubleshooting for production issues.
Conduct root cause analysis and blameless post-mortems.
Define and manage SLOs, SLIs, and error budgets to maintain system reliability.

Required Skills & Qualifications

Technical Skills

Strong expertise in Amazon Connect (Contact Flows, CTRs, CCP customization).
Hands-on experience with AWS services including Lambda, DynamoDB, S3, IAM, and networking.
Proficiency in Infrastructure as Code tools such as Terraform (preferred), CloudFormation, or AWS CDK.
Experience building CI/CD pipelines using GitLab, GitHub Actions, Jenkins, or similar tools.
Strong programming/scripting skills in Python or Node.js.
Experience with observability tools such as CloudWatch, Kinesis, ELK Stack, or Splunk.

Experience

3+ years of experience in Site Reliability Engineering or DevOps roles.
2+ years of hands-on experience with Amazon Connect or similar CCaaS platforms.
Experience supporting high-volume service desk or call center environments.

Education & Certifications

Bachelor’s degree in Computer Science, Engineering, or a related field.
Preferred certifications:
AWS Certified DevOps Engineer – Professional
AWS Certified SysOps Administrator

Peringatan Penting

Jangan pernah kongsikan maklumat bank atau kad kredit anda semasa memohon pekerjaan. Elakkan membuat sebarang pembayaran atau mengisi survey yang tidak berkaitan. Jika ada yang mencurigakan, sila laporkan iklan pekerjaan ini segera.

Lebih Lanjut

Mohon

Kerja Sepenuh Masa, Site Reliability Engineer di Elliott Moss Consulting - Maukerja

Site Reliability Engineer

Elliott Moss Consulting