Description
We are looking for a Site Reliability Engineer to join our team. The ideal candidate will have 2-6 years of experience in a similar role and will be responsible for ensuring the reliability, scalability, and performance of our systems.
Responsibilities
- Design, build, and maintain highly available systems
- Monitor and respond to system alerts and incidents
- Identify and resolve performance issues
- Develop and implement automation tools for system management
- Collaborate with development teams to ensure seamless deployment and operation of applications
- Maintain documentation of system architecture and processes
- Participate in on-call rotation for after-hours support
Skills and Qualifications
- Bachelor's degree in Computer Science or related field
- 2-6 years of experience in a Site Reliability Engineer or similar role
- Strong knowledge of Linux systems administration
- Experience with configuration management tools such as Puppet, Chef, or Ansible
- Experience with cloud infrastructure providers such as AWS, Azure, or GCP
- Strong scripting skills in at least one language such as Python, Ruby, or Bash
- Familiarity with monitoring tools such as Nagios, Zabbix, or Prometheus
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills