
GCP + Pyspark Developer (RARR Job 5321)
Job Skills
Job Description
We are seeking a skilled and proactive GCP Big Data Engineer with mandatory hands-on experience in PySpark and Google Cloud Platform (GCP) services. The ideal candidate will have a strong background in building and maintaining scalable data pipelines and should be comfortable working with BigQuery, Python, and PySpark in large-scale environments. You will collaborate with analysts, architects, and business teams to deliver data solutions that support strategic business initiatives.
Key Responsibilities:
-
Design, develop, and maintain scalable ETL/ELT data pipelines on GCP using PySpark and other tools.
-
Leverage BigQuery, Cloud Storage, Cloud Composer, and related GCP services for data ingestion, transformation, and loading.
-
Develop efficient and modular PySpark scripts for batch processing and data transformation.
-
Optimize data pipelines for performance, scalability, and cost-effectiveness.
-
Ensure robust data validation, governance, and quality checks across all workflows.
-
Collaborate with data scientists, analysts, and product teams to define and implement end-to-end data solutions.
-
Monitor production pipelines and proactively resolve issues to ensure reliability and uptime.
-
Apply CI/CD practices for managing and deploying data engineering code.
Mandatory Skills:
-
5–12 years of overall experience in data engineering.
-
Minimum 4 years of hands-on experience with GCP and its data services.
-
Strong proficiency in PySpark for large-scale data processing.
-
Expertise in BigQuery including performance tuning, partitioning, clustering, and cost optimization.
-
Strong skills in Python for scripting, automation, and orchestration.
-
Experience with Cloud Composer (Apache Airflow) for workflow scheduling.
-
Solid understanding of data modeling, warehouse architecture, and best practices.
-
Familiarity with version control systems (e.g., Git) and DevOps tools.
Preferred Qualifications:
-
GCP Professional Data Engineer certification.
-
Experience with additional GCP tools like Pub/Sub, Dataflow, and Dataproc.
-
Understanding of data security and compliance standards (e.g., GDPR, HIPAA).
-
Exposure to other programming languages such as Java or Scala is a plus.
Education:
-
Bachelor’s or Master’s degree in Computer Science, Information Technology, Engineering, or related field.