Logo

GCP + Pyspark Developer (RARR Job 5321)

For International Trade And Development Company
5 - 16 Years
Full Time
Up to 30 Days
Up to 34 LPA
1 Position(s)
Hyderabad, Pune
Posted 16 Days Ago

Job Skills

Job Description

We are seeking a skilled and proactive GCP Big Data Engineer with mandatory hands-on experience in PySpark and Google Cloud Platform (GCP) services. The ideal candidate will have a strong background in building and maintaining scalable data pipelines and should be comfortable working with BigQuery, Python, and PySpark in large-scale environments. You will collaborate with analysts, architects, and business teams to deliver data solutions that support strategic business initiatives.

Key Responsibilities:

  • Design, develop, and maintain scalable ETL/ELT data pipelines on GCP using PySpark and other tools.

  • Leverage BigQuery, Cloud Storage, Cloud Composer, and related GCP services for data ingestion, transformation, and loading.

  • Develop efficient and modular PySpark scripts for batch processing and data transformation.

  • Optimize data pipelines for performance, scalability, and cost-effectiveness.

  • Ensure robust data validation, governance, and quality checks across all workflows.

  • Collaborate with data scientists, analysts, and product teams to define and implement end-to-end data solutions.

  • Monitor production pipelines and proactively resolve issues to ensure reliability and uptime.

  • Apply CI/CD practices for managing and deploying data engineering code.

Mandatory Skills:

  • 5–12 years of overall experience in data engineering.

  • Minimum 4 years of hands-on experience with GCP and its data services.

  • Strong proficiency in PySpark for large-scale data processing.

  • Expertise in BigQuery including performance tuning, partitioning, clustering, and cost optimization.

  • Strong skills in Python for scripting, automation, and orchestration.

  • Experience with Cloud Composer (Apache Airflow) for workflow scheduling.

  • Solid understanding of data modeling, warehouse architecture, and best practices.

  • Familiarity with version control systems (e.g., Git) and DevOps tools.

Preferred Qualifications:

  • GCP Professional Data Engineer certification.

  • Experience with additional GCP tools like Pub/Sub, Dataflow, and Dataproc.

  • Understanding of data security and compliance standards (e.g., GDPR, HIPAA).

  • Exposure to other programming languages such as Java or Scala is a plus.

Education:

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, Engineering, or related field.