JobSeeker LogIn

Recruiter LogIn

Director- SRE, DevOps, Monitoring and Database Operations(Job No 748)

For Software Development

18 - 24 Years

Full Time

Up to 30 Days

Up to 60 LPA

1 Position(s)

Bangalore / Bengaluru, Noida, Pune

18 - 24 Years

Full Time

Up to 30 Days

Up to 60 LPA

1 Position(s)

Bangalore / Bengaluru, Noida, Pune

Posted By : Mindtel Global Private Limited

No longer accepting applications

Discover more job opportunities that match your interests.

Job Skills

Job Description

Leadership & Strategy:

• Provide technical and people leadership to SRE, DevOps, Monitoring, and Database Operations teams.

• Collaborate with leadership on budgeting, planning, hiring, and managing third-party contracts.

• Oversee project status, assemble project teams, and define assignments with schedules and milestones.

Platform Reliability & Performance:

• Drive continuous improvement of reliability, stability, and performance of digital platforms.

• Oversee implementation of automated telemetry, observability, and applied intelligence systems.

• Lead efforts to develop automated alerting, self-healing mechanisms, and intelligent response systems.

Incident & Escalation Management:

• Ensure 24/7 uptime of sites and services, with minimal unplanned downtime.

• Serve as Escalation Manager/Critical Incident Manager during major incidents, leading teams in rapid service restoration.

• Provide on-call escalation support based on 24/7/365 schedules.

• Communicate timely updates and incident reports to senior leadership.

Collaboration & Integration:

• Partner with administrators, platform engineers, and other stakeholders to achieve highly reliable infrastructure, systems, and integrations.

• Collaborate with product, application development, QA, and technology teams to enhance service reliability and performance.

Incident Management & Automation:

• Provide advanced Incident and Problem Management support to effectively diagnose, remediate, and resolve platform issues.

• Automate critical workflows across the platform to minimize manual errors and reduce human intervention.

• Implement ITIL processes like Incident, Problem, and Change Management.

Monitoring & Scalability:

• Design and implement effective monitoring systems with proper alerting and escalation mechanisms for critical events.

• Ensure timely capacity planning and infrastructure upgrades for optimal reliability.

• Develop and refine processes to minimize Mean Time to Recover (MTTR) and extend Mean Time to Failure (MTTF).

Documentation & Compliance:

• Create and maintain detailed documentation, including run books, incident response guides, post-mortem reports, RCAs, and mitigation plans.

• Ensure all changes adhere to established procedures and documentation standards.

Business Alignment:

• Understand business workflows and map technology solutions to address problems effectively.

• Lead conversations and provide technical support to both internal and external customers.

Matching Jobs

Site Reliability Engineer (AWS DevOps)

For An Indian Multinational Information Technology Company

Bengaluru, Hyderabad, Pune

4 - 9 Years ( Full Time )

Aws, Devops, Ec2, Lambda, Python, S3, Sre

Not disclosed