Principal DevOps Engineer ()
Job Skills
Job Description
This role owns infrastructure, reliability, and automation for a real-time data annotation platform. The engineer architects scalable cloud systems, leads Kubernetes and CI/CD strategy, and ensures high availability, security, and disaster recovery for enterprise-grade ML pipelines.
Key Responsibilities:
-
Design highly available, secure cloud infrastructure supporting distributed microservices at scale
-
Lead multi-cluster Kubernetes strategy optimized for GPU and multi-tenant workloads
-
Implement Infrastructure-as-Code using Terraform across full infrastructure lifecycle
-
Optimize Kafka-based data pipelines for throughput, fault tolerance, and low latency
-
Deliver zero-downtime CI/CD pipelines using GitOps-driven deployment models
-
Establish SRE practices with SLOs, p95 and p99 monitoring, and FinOps discipline
-
Ensure production-ready disaster recovery and business continuity testing
S - SKILL:
The Expertise We RequireThese details define the concrete, demonstrable capabilities necessary to take full ownership of the role’s responsibilities.
Must Have's
-
6–10 years DevOps / SRE / Cloud Infrastructure experience
-
Should have a product based company experience
-
Expert-level Kubernetes (networking, security, scaling, controllers)
-
Terraform Infrastructure-as-Code mastery
-
Hands-on Kafka production experience
-
AWS cloud architecture and networking expertise
-
Strong scripting in Python, Go, or Bash
-
GitOps and CI/CD tooling experience