Site Reliability Engineer (42639)

Join a dynamic team as a Site Reliability Engineer. You’ll design and develop distributed applications and PaaS infrastructure on GKE and OpenShift, automate deployments with Jenkins, ArgoCD, Ansible, and Terraform, and implement observability solutions to monitor system health. You will troubleshoot issues, ensure consistent environments, and strengthen security practices. Ideal candidates have a degree in IT, hands-on experience with Docker/Kubernetes/OpenShift, cloud expertise (GCP/GKE), networking knowledge, and automation skills.

🚀 Project
- designing and developing an architecture of distributed applications and infrastructure based on GKE and OCP
- automating the creation of infrastructure and application delivery using industry-standard tools such as Jenkins, ArgoCD, Ansible, and Terraform
- designing and developing observability solutions to monitor system and application health
- ensuring stable and consistent application environments, troubleshooting, and resolving problems and issues within the PaaS infrastructure and OpenShift/GKE platforms
- understanding security best practices and proactively strengthening security posture
- designing and implementing automated deployment and configuration to support the OpenShift Platform
- participating in on-call rotation
- documenting software changes and problem resolution steps

🎯 Skills
- university degree in Information Technology or equivalent
- hands-on familiarity with containerization using Docker and Kubernetes or OpenShift
- experience with public cloud (GCP) with focus on GKE
- in-depth understanding of all facets of software development lifecycle and associated tools (e.g. Git, Jenkins , Helm, etc.)
- strong understanding of networking including cloud and SDN
- hands-on experience with automation scripting, and infrastructure-as-code
- understanding of application protocols (e.g. DNS, SSH, HTTPS) and their behaviors across networks
- experience with automation, software deployment and orchestration technologies (e.g., Ansible, Terraform, GitHub)

💡 Nice to have
- hands-on experience with monitoring tools such as ELK / Loki, Grafana, Prometheus or equivalent
- expertise in RPM-based Linux (CentOS / RedHat Enterprise Linux 8/9) including installation, system monitoring and maintenance, tuning and troubleshooting, etc.
- work experience with Kanban and Scrum is appreciated
- expertise in deployment and management of Java-based applications in Linux environment, virtualization/containerized environments and troubleshooting skills

#LI-OK1

#devops

Site Reliability Engineer (42639)

I'm interested