
SRE (NCS/Job/ 2166)
Job Skills
Job Description
Role
Site Reliability Engineer
Job Code
SSL-25-26-604
Mode of Hire
Full Time
Work Location
Bengaluru / Hyderabad
Experience
4-10 Years
Interview Rounds
3 (L1, L2 & Client round)
Lead Time to Join
0-15 Days maximum (preferably immediate)
Responsibilities
-
Lead the design and implementation of highly available, resilient, and scalable infrastructure.
-
Define, monitor, and enforce SLIs, SLOs, and SLAs across services.
-
Drive incident management processes including RCA, blameless postmortems, and remediation.
-
Automate infrastructure provisioning and application deployments (Terraform, Ansible, Helm).
-
Architect and improve observability platforms (metrics, tracing, logging).
-
Collaborate with engineering teams on scaling strategies, capacity planning, and performance optimization.
-
Implement disaster recovery (DR) and business continuity strategies.
-
Mentor and guide junior/mid-level SREs.
-
Partner with security teams to implement zero-trust principles, RBAC, IAM policies, and compliance controls.
Skills
-
4–10 years of experience in SRE, DevOps, or Infrastructure roles.
-
Expert in Kubernetes, cloud platforms AWS and large-scale distributed systems.
-
Expert in design, build and maintain containerized applications using Docker.
-
Strong background in Linux systems, networking, and storage.
-
Hands-on with IaC (Terraform, Ansible), CI/CD pipelines, and GitOps workflows.
-
Deep expertise in observability & incident management practices.
-
Minimum knowledge in scripting and development Python.
-
Strong monitoring abilities.