
Data Engineer (Nuc Job/ 79)
Job Skills
Job Description
Job Title: Data Engineer
Exp.: 5+ Years
Location: India
About the Role:
We are seeking a proactive Data Engineer to support our data engineering initiatives and real estate market intelligence platform. You will collaborate closely with our London based Data team to maintain robust ETL/ data extraction pipelines, audit our growing data estate, and ensure the absolute accuracy of our tracking systems covering millions of datapoints across UK and Europe.
Key Responsibilities:
· Implement our data quality roadmap. Implement and maintain standardised testing frameworks and auditing protocols that ensure data integrity & quality across our entire extraction and transformation layers.
· AI & LLM Data Quality Engineering:
Build and maintain rigorous testing suites for LLM-driven workflows. You will implement automated checks for prompt injection, output hallucination, and graceful failure mechanisms to ensure our AI-dependent ETL pipelines remain accurate and secure.
· Data Auditing & Analysis:
Continuously audit our real estate data, thoroughly test newly added data objects across BigQuery, Cloud Storage, SharePoint etc and proactively find and fix data quality issues/anomalies.
· Data Extraction & Scraping:
Support and maintain Python-based web scrapers (using libraries like Playwright, Scrapy etc) that capture daily rent and availability metrics.
· Data Transformation: Write and optimize SQL workflows to transform raw scraper logs, enforcing data integrity at the analytical layer.
Required Qualifications & Experience:
· Education: Master's degree in computer science, data science or a closely related field.
· Quality Assurance/ Testing : 6+ years of expertise in building automation systems and unit testing frameworks with hands-on experience auditing, quality checking and maintaining large-scale datasets.
Hands-on experience with Pytest/Unittest and Dataplex/Great Expectations.
· Python: 6+ years of professional, hands-on experience. Must have strong proficiency in Object-Oriented (Classful) programming, building ETL/ELT pipelines. Experience using LLM APIs (e.g., OpenAI API, Gemini API, LangChain etc), Places API and libraries like Pandas is required.
· SQL & No-SQL: 6+ years of professional experience writing advanced, optimized queries for data transformation and analytics. Experience using No-SQL databases is required
· Google Cloud Platform (GCP): Prior experience of 4+ years using services like BigQuery, Firestore, Cloud Storage, Dataform, Cloud Run and Compute Engine.
· GitHub & DevOps: Experience with GitHub, Terraform and setting up CI/CD pipelines (e.g., GitHub Actions) is required.
· IDEs: Work experience using LLM supported coding tools like Cursor, Antigravity is required
Technical Stack & Environment: While you may not use every tool daily, you will be operating within and auditing systems built on the following stack:
· Programming Languages & Libraries: Python, SQL, YAML, No-SQL.
· Cloud & Infrastructure (GCP): BigQuery, Cloud Storage, Compute Engine, Cloud Run Jobs, Cloud Run Functions, Cloud Build, Cloud Scheduler, Firestore, Dataform, Artifact Registry, Secrets Manager & Cloud Dataflow.
· DevOps, QA & Tools: GitHub, GitHub Actions (CI/CD), Docker, Terraform (IaC), Linux, Pytest, Pydantic & Jira.