- Should have total 4-7 Yrs of experience in Machine Learning Engineering
- Strong on programming languages like Python, Java
- Must have one cloud hands-on experience (GCP preferred)
- Must have: Experience working with Dockers
- Must have: Environments managing (e.g venv, pip, poetry, etc.)
- Must have: Experience with orchestrators like Vertex AI pipelines, Airflow, etc
- Must have: Understanding of full ML Cycle end-to-end
- Must have: Data engineering, Feature Engineering techniques
- Must have: Experience with ML modelling and evaluation metrics
- Must have: Experience with Tensorflow, Pytorch or another framework
- Must have: Experience with Models monitoring
- Good to have: Hyperparameter tuning experience.
- Proficient in either Apache Spark or Apache Beam or Apache Flink
- Must have: Advance SQL knowledge
- Must be aware of Streaming concepts like Windowing , Late arrival , Triggers etc
- Should have hands-on experience on Distributed computing
- Should have working experience on Data Architecture design
- Should be aware of storage and compute options and when to choose what
- Should have good understanding on Cluster Optimisation/ Pipeline Optimisation strategies
- Should have exposure on GCP tools to develop end to end data pipeline for various scenarios (including ingesting data from traditional data bases as well as integration of API based data sources).
- Should have Business mindset to understand data and how it will be used for BI and Analytics purposes.
- Should have working experience on CI/CD pipelines, Deployment methodologies, Infrastructure as a code (eg. Terraform)
- Good to have, Hands-on experience on Kubernetes
- Good to have Vector based Database like Qdrant
- Good to have: LLM experience (embeddings generation, embeddings indexing, RAG, Agents, etc.).
Experience in Working with GCP tools like:
Storage :CloudSQL , Cloud Storage, Cloud Bigtable,Bigquery, Cloud Spanner, Cloud DataStore, Vector database
Ingest: Pub/Sub, Cloud Functions, AppEngine, Kubernetes Engine, Kafka, Micro services
Schedule : Cloud Composer, Airflow
Processing: Cloud Dataproc, Cloud Dataflow, Apache Spark, Apache Flink
CI/CD : Bitbucket+Jenkinjs / Gitlab,Infrastructre as a tool : Terraform