Logo

K8S Lifecycle Automation Engineer (RARR Job 5420)

For International Trade And Development Company
10 - 20 Years
Full Time
Up to 30 Days
Up to 40 LPA
1 Position(s)
Bangalore / Bengaluru, Hyderabad
Posted 4 Days Ago

Job Skills

Job Description

Position Overview

We are seeking a Senior Kubernetes Platform Engineer to design and implement the ZeroTouch Build, Upgrade, and Certification pipeline for our on-premises GPU cloud platform. This role will focus on automating the Kubernetes layer and its dependencies (e.g., GPU drivers, networking, runtime) using 100% GitOps workflows. You will collaborate across teams to deliver a fully declarative, scalable, and reproducible infrastructure stack — from hardware to Kubernetes and platform services.


Key Responsibilities

  • Architect and implement GitOps-driven Kubernetes cluster lifecycle automation using tools like kubeadm, ClusterAPI, Helm, and Argo CD.

  • Develop and manage declarative infrastructure components for:

    • GPU stack deployment (e.g., NVIDIA GPU Operator)

    • Container runtime configuration (Containerd)

    • Networking layers (CNI plugins such as Calico, Cilium, etc.)

  • Lead automation initiatives to enable zero-touch upgrades and certification pipelines for Kubernetes clusters and workloads.

  • Maintain Git-backed sources of truth for all platform configurations and integrations.

  • Standardize deployment practices for multi-cluster GPU environments ensuring scalability, repeatability, and compliance.

  • Integrate observability, testing, and validation into continuous delivery (e.g., cluster conformance, GPU health checks).

  • Collaborate with infrastructure, security, and SRE teams to ensure smooth handoffs between hardware/OS and Kubernetes platform layers.

  • Mentor junior engineers and shape the platform automation roadmap.


Required Skills & Experience

  • 10+ years of hands-on infrastructure engineering experience with strong Kubernetes focus.

  • Core expertise in: Kubernetes API, Helm templating, Argo CD, GitOps integration, Go/Python scripting, Containerd.

  • Deep knowledge of:

    • Kubernetes cluster management (kubeadm, ClusterAPI)

    • Argo CD for GitOps-based delivery

    • Helm for application and cluster add-on packaging

    • Containerd as a container runtime in GPU workloads

  • Experience deploying & managing NVIDIA GPU Operator or equivalent in production.

  • Strong understanding of CNI plugin ecosystems, network policies, and multi-tenant networking.

  • Proven track record with Infrastructure-as-Code using Git-based workflows.

  • Experience building Kubernetes clusters in on-premises environments (vs managed cloud services).

  • Solid scripting/automation skills (Bash, Python, Go).

  • Familiarity with Linux internals, systemd, and OS-level tuning for container workloads.


Preferred / Bonus Skills

  • Experience developing custom controllers/operators or Kubernetes API extensions.

  • Contributions to Kubernetes or CNCF projects.

  • Exposure to service meshes, ingress controllers, or workload identity providers.