atsmantra logo
Mindtel Global Private Limited logo

Director- SRE, DevOps, Monitoring and Database Operations(Job No 748)

For Software Development

18 - 24 Years

Full Time

Up to 30 Days

Up to 60 LPA

1 Position(s)

Bangalore / Bengaluru, Noida, Pune

18 - 24 Years

Full Time

Up to 30 Days

Up to 60 LPA

1 Position(s)

Bangalore / Bengaluru, Noida, Pune

no more applicationNo longer accepting applications
Discover more job opportunities that match your interests.

Job Description

Leadership & Strategy: 

• Provide technical and people leadership to SRE, DevOps, Monitoring, and Database Operations teams.

• Collaborate with leadership on budgeting, planning, hiring, and managing third-party contracts.

• Oversee project status, assemble project teams, and define assignments with schedules and milestones.

Platform Reliability & Performance:

• Drive continuous improvement of reliability, stability, and performance of digital platforms.

• Oversee implementation of automated telemetry, observability, and applied intelligence systems.

• Lead efforts to develop automated alerting, self-healing mechanisms, and intelligent response systems.

Incident & Escalation Management: 

• Ensure 24/7 uptime of sites and services, with minimal unplanned downtime.

• Serve as Escalation Manager/Critical Incident Manager during major incidents, leading teams in rapid service restoration.

• Provide on-call escalation support based on 24/7/365 schedules.

• Communicate timely updates and incident reports to senior leadership.

Collaboration & Integration: 

• Partner with administrators, platform engineers, and other stakeholders to achieve highly reliable infrastructure, systems, and integrations.

• Collaborate with product, application development, QA, and technology teams to enhance service reliability and performance.

Incident Management & Automation:

 • Provide advanced Incident and Problem Management support to effectively diagnose, remediate, and resolve platform issues.

• Automate critical workflows across the platform to minimize manual errors and reduce human intervention.

• Implement ITIL processes like Incident, Problem, and Change Management.

Monitoring & Scalability:

• Design and implement effective monitoring systems with proper alerting and escalation mechanisms for critical events.

• Ensure timely capacity planning and infrastructure upgrades for optimal reliability.

• Develop and refine processes to minimize Mean Time to Recover (MTTR) and extend Mean Time to Failure (MTTF).

Documentation & Compliance:

• Create and maintain detailed documentation, including run books, incident response guides, post-mortem reports, RCAs, and mitigation plans.

• Ensure all changes adhere to established procedures and documentation standards.

Business Alignment:

• Understand business workflows and map technology solutions to address problems effectively.

• Lead conversations and provide technical support to both internal and external customers.

Matching Jobs

Rarr Technologies Pvt Ltd logo
Site Reliability Engineer (AWS DevOps)

For An Indian Multinational Information Technology Company

location icon

Bengaluru, Hyderabad, Pune

experience icon

4 - 9 Years ( Full Time )

skill icon

Aws, Devops, Ec2, Lambda, Python, S3, Sre

Not disclosed

share icon
atsMantra logo
A unified recruitment ecosystem designed to simplify hiring for companies, recruitment agencies, and job seekers alike. From powerful applicant tracking to smart job discovery, we offer intelligent tools that bring speed, clarity, and structure to every step of the recruitment journey.
atsMantra Facebook accountatsMantra Instagram accountatsMantra Twitter accountatsMantra LinkedIn accountatsMantra YouTube account