Logo

Machine Learning Engineer (RARR Job 6294)

For India'S Leading Diversified Group Of Manufacturing And Services
3 - 6 Years
Full Time
Immediate
Up to 18 LPA
1 Position(s)
Gurugram/ Gurgaon
Posted 7 Days Ago

Job Skills

Job Description

About the Role

This is the core technical role. You will own the full ML pipeline - from raw inverter and weather time-series data to production-grade models that guarantee solar generation, predict inverter failures 7-14 days in advance, and detect anomalies in real time. We focus heavily on traditional machine learning built elegantly and robustly as per our requirements (XGBoost, LightGBM, Prophet, Isolation Forest) to ensure fast inference and strong accuracy. The right model for the job, shipped reliably, is what matters.

What You'll Own

  • End-to-end ML pipeline: data ingestion from ClickHouse → feature engineering → training → evaluation → MLflow registry → FastAPI inference API
  • Generation model: compute and track Performance Ratio (PR) for each site, detect underperformance vs GHI-based expected yield with ±5% accuracy
  • Anomaly detection: Isolation Forest (phase 1) → LSTM autoencoder (phase 2) on MPPT power ratio, inverter temperature trend, fault frequency
  • Predictive maintenance: Remaining Useful Life (RUL) model on inverter temp + fault code history - 7-14 day failure prediction horizon
  • Yield forecast: LightGBM / XGBoost model using Solcast/Open-Meteo GHI forecasts + historical PR baseline to predict weekly kWh ± 8%
  • MLOps: weekly automated retraining pipeline, model versioning in MLflow, A/B model promotion logic, performance drift detection
  • Feature engineering from raw time-series: rolling averages, sin/cos time encoding, weather transposition (GHI → POA), lag features, string imbalance ratios
  • Monthly automated report generation: actual vs forecast, PR trend, maintenance log, CO₂ offset

Bonus Skills (Strong Plus)

  • Solar domain knowledge - Performance Ratio, specific yield, CUF, irradiance transposition models (Perez, Hay-Davies)
  • Survival analysis / RUL modelling - Weibull distribution, Cox proportional hazards, degradation models
  • LSTM autoencoder for anomaly detection - implementation experience, not just awareness
  • ClickHouse - time-series query patterns, MergeTree engines, materialised views for feature computation
  • Kafka consumer in Python - reading from Kafka topics for online feature computation
  • AWS SageMaker, Vertex AI, or any managed ML platform - training job orchestration
  • NILM (non-intrusive load monitoring) - useful for future load disaggregation features
  • Elixir, Go, or Rust - reading code from these languages for pipeline integration

What Good Looks Like in This Role

  • Day 30: PR pipeline computing daily for each site; baseline anomaly model live with <10% false positive rate
  • Day 90: MPPT imbalance detection live; inverter temp trend model with 7-day prediction horizon
  • Day 150: All 4 models in production; weekly retraining cron running; monthly PDF report auto-generating
  • Day 365: LSTM anomaly upgrade live; yield forecast MAE <8%; RUL model validated on 6+ months of fault history

You'll Thrive Here If You

  • Prefer simple, explainable models that work over complex models that sometimes work
  • Are rigorous about evaluation - you set up proper train/val/test splits on time-series data (no data leakage)
  • Understands that a model in a Jupyter notebook is not a model in production
  • Communicate uncertainty clearly - you know when your model is confident and when it isn't
  • Are self-directed - you can take 'we need to predict failures' and figure out the data, the model, and the metric

What You Get

  • Ownership of the entire ML stack - from data to production inference
  • Real, novel problem: generation on time-series IoT data is not a solved problem
  • Work directly with the founding team - your architectural decisions shape the product