At SNSoft, we're at the forefront of technological advancement, integrating cutting-edge solutions into everyday life. Our culture thrives on innovation, collaboration, and a commitment to excellence. We foster an inclusive environment where every team member's contribution is valued and where everyone has the opportunity to grow.
Your Role as a Lead / Senior Site Reliability Engineer:
Our Cloud Operations team seeks a Lead / Senior Site Reliability Engineer who is passionate about problem-solving, automating, and maintaining Appspace’s Cloud Platform to support the needs of our Engineering and Customer Care teams. The ideal candidate will have a considerable amount of experience in site reliability engineering and/or AIOps, and evolving operations into leveraging more automation to scale cloud-native platforms. You will work closely with a global team of cloud, engineering, product, and service professionals to improve our platform’s resiliency and scalability, which directly improves our customers’ experience with Appspace. With this role, you can grow your capabilities as a Lead / Senior Site Reliability Engineer given the large-scale size of our cloud platform combined with our smaller-sized Cloud Operations team, which means you will have opportunities to work on all Cloud Infrastructure, end-to-end. This is a mission-critical role for Appspace, therefore while we offer flex time, it should be scheduled ahead of time, otherwise shift engagement is mandatory outside lunch and break times. On-Call coverage will be required weekly during a limited window of US daytime hours over the weekend. This role highly prefers candidates who can attend our Kuala Lumpur office at least 2 days per week.
This is your opportunity to be part of an awesome company that is rapidly growing and defining the modern workplace experience market!
...
To oversee availability, reliability, resilience, performance, security, and monitoring of applications on Azure Cloud and various supporting platforms to ensure business operational SLA and SLO are met.
Conduct incident management, cost management and application health monitoring.
...