K8S Lifecycle Automation Engineer (RARR Job 5420)

For International Trade And Development Company

10 - 20 Years

Full Time

Up to 30 Days

Up to 40 LPA

1 Position(s)

Bangalore / Bengaluru, Hyderabad

Posted By : Rarr Technologies Pvt Ltd

Posted 4 Days Ago

Job Skills

Job Description

Position Overview

We are seeking a Senior Kubernetes Platform Engineer to design and implement the ZeroTouch Build, Upgrade, and Certification pipeline for our on-premises GPU cloud platform. This role will focus on automating the Kubernetes layer and its dependencies (e.g., GPU drivers, networking, runtime) using 100% GitOps workflows. You will collaborate across teams to deliver a fully declarative, scalable, and reproducible infrastructure stack — from hardware to Kubernetes and platform services.

Key Responsibilities

Architect and implement GitOps-driven Kubernetes cluster lifecycle automation using tools like kubeadm, ClusterAPI, Helm, and Argo CD.
Develop and manage declarative infrastructure components for:
- GPU stack deployment (e.g., NVIDIA GPU Operator)
- Container runtime configuration (Containerd)
- Networking layers (CNI plugins such as Calico, Cilium, etc.)
Lead automation initiatives to enable zero-touch upgrades and certification pipelines for Kubernetes clusters and workloads.
Maintain Git-backed sources of truth for all platform configurations and integrations.
Standardize deployment practices for multi-cluster GPU environments ensuring scalability, repeatability, and compliance.
Integrate observability, testing, and validation into continuous delivery (e.g., cluster conformance, GPU health checks).
Collaborate with infrastructure, security, and SRE teams to ensure smooth handoffs between hardware/OS and Kubernetes platform layers.
Mentor junior engineers and shape the platform automation roadmap.

Required Skills & Experience

10+ years of hands-on infrastructure engineering experience with strong Kubernetes focus.
Core expertise in: Kubernetes API, Helm templating, Argo CD, GitOps integration, Go/Python scripting, Containerd.
Deep knowledge of:
- Kubernetes cluster management (kubeadm, ClusterAPI)
- Argo CD for GitOps-based delivery
- Helm for application and cluster add-on packaging
- Containerd as a container runtime in GPU workloads
Experience deploying & managing NVIDIA GPU Operator or equivalent in production.
Strong understanding of CNI plugin ecosystems, network policies, and multi-tenant networking.
Proven track record with Infrastructure-as-Code using Git-based workflows.
Experience building Kubernetes clusters in on-premises environments (vs managed cloud services).
Solid scripting/automation skills (Bash, Python, Go).
Familiarity with Linux internals, systemd, and OS-level tuning for container workloads.

Preferred / Bonus Skills

Experience developing custom controllers/operators or Kubernetes API extensions.
Contributions to Kubernetes or CNCF projects.
Exposure to service meshes, ingress controllers, or workload identity providers.

K8S Lifecycle Automation Engineer (RARR Job 5420)

Job Skills

Job Description

Matching Jobs