
HW/FW Automation Engineer (RARR Job 5422)
Job Skills
Job Description
We are seeking a Senior Infrastructure Automation Engineer to lead the design and implementation of a ZeroTouch Build, Upgrade, and Certification framework for our on-premises GPU cloud environment. This role requires deep technical expertise in bare-metal provisioning, configuration management, and full-stack automation — from hardware to Kubernetes — using GitOps principles. The ideal candidate will be a hands-on engineer, capable of scaling infrastructure automation systems while mentoring others.
Key Responsibilities
-
Architect, lead, and implement fully automated ZeroTouch deployment pipelines for GPU cloud infrastructure spanning hardware, OS, and Kubernetes platform layers.
-
Build GitOps-based workflows to manage end-to-end infrastructure lifecycle — from provisioning to continuous compliance.
-
Design and maintain automation for:
-
Bare-metal control – power cycling, provisioning, remote installs
-
Firmware and configuration flashing – BIOS, NIC, RAID
-
Hardware inventory management
-
Configuration drift detection and remediation
-
-
Develop and extend internal automation frameworks using Ansible, Python, and related tools.
-
Act as a technical authority and mentor, collaborating closely with hardware, SRE, and platform engineering teams.
-
Lead architectural and design reviews for infrastructure automation systems.
-
Define and implement best practices for Infrastructure-as-Code, compliance, and operational resilience.
-
Champion automation-driven operational models, minimizing manual intervention to near zero.
Required Skills & Experience
-
10+ years of hands-on experience in infrastructure engineering, automation, and systems design with proven delivery of scalable, maintainable solutions.
-
Core expertise in: Ansible, Python, ipmitool, firmware scripting, Linux shell scripting.
-
Deep knowledge of:
-
Ansible – automation and configuration management
-
Python – scripting, integration, and automation logic
-
ipmitool/Redfish – low-level hardware management
-
-
Proven experience in bare-metal automation for data center environments, including:
-
Power control and PXE booting
-
BIOS/NIC/RAID firmware upgrades
-
Hardware and platform inventory systems
-
-
Strong foundation in Linux systems, networking, and Kubernetes infrastructure.
-
Proficiency in GitOps workflows and related tools.
-
Experience with CI/CD systems and Git-based pipelines for infrastructure.
-
Familiarity with infrastructure monitoring, logging, and drift detection.
-
Strong collaboration and communication skills, especially across hardware, platform, and SRE teams.
Preferred / Bonus Skills
-
Familiarity with Terraform, Chef, and cloud automation platforms.
-
Prior leadership or mentorship experience.
-
Contributions to or maintenance of open-source infrastructure projects.
-
Exposure to GPU-based compute stacks and high-performance workloads.