|
Aviation Platform Engineering Architect – Kubernetes (K8s) & Argo
Role Overview
Lead the design and evolution of a cloud‑native platform for airline operations (Crew Scheduling, Flight Ops, CMS/OPS) built on Kubernetes with GitOps using Argo CD and Argo Workflows. Own multi‑cluster architecture, security/compliance, reliability, observability, and progressive delivery for mission‑critical aviation workloads. (AA TOPS references already define K8s/DevOps foundations and GitOps (ArgoCD) in the skills stack.)
🎯 Key Responsibilities
1) Platform Architecture & Roadmap
- Define multi‑cluster, multi‑region AKS/EKS reference architectures (control/worker plane sizing, HA, DR, backup/restore) and platform roadmaps for Crew/OPS modules. (K8s/DevOps platform and FAA/BiTS compliance noted in AA TOPS materials.)
- Establish GitOps standards (Argo CD) for environment parity, change safety, and progressive delivery across flight and crew applications. (GitOps with ArgoCD listed in internal JD.)
2) Application Delivery & Orchestration
- Standardize Helm/Kustomize packaging; design Argo ApplicationSets for fleet‑wide promotion (dev → QA → UAT → prod) with automated policy checks and approvals.
- Build Argo Workflows pipelines for batch/stream processing (e.g., pairing optimizers, rules engines, schedule calculators), integrating Kafka/Pinot and domain services where applicable. (Rules/Kafka/Pinot are active in AA POC.)
3) Security, Compliance & Governance
- Implement workload identity, RBAC, NetworkPolicies, PodSecurity; enforce policies with OPA/Gatekeeper or Kyverno.
- Codify FAA/EASA evidence (change logs, traceability, approvals) and adhere to BiTS and hub‑and‑spoke security guidance referenced in TOPS implementations. [
4) Reliability, Observability & SRE
- Define SLOs/SLIs for movement control, crew tracking, rules services; automate canary/blue‑green via Argo CD and progressive rollouts.
- Standardize Prometheus/Grafana/ELK/Loki/OpenTelemetry for metrics, logs, traces; create dashboards and runbooks for incident response. (Observability tools called out in AA Onsite JD.)
5) Infrastructure as Code & Automation
- Drive Terraform/Crossplane for cluster and cloud resource lifecycle; encode networking (CNI/Istio/Linkerd) and ingress (NGINX/Contour) as declarative assets. (IaC, service mesh, CNI are part of internal skills sets.) 6) Cost, Performance & Resiliency Engineering
- Optimize cluster autoscaling, HPA/VPA, node pools, spot/on‑demand mixes; benchmark latency for real‑time ops (e.g., tail assignment, crew legality checks).
- Engineer DR (e.g., Velero) and far‑region support noted in TOPS, targeting 99%+ availability for core airline use‑cases. 7) Ways of Working & Leadership
- Lead architecture reviews, ADRs, reference implementations; mentor platform/SRE teams.
- Partner with Product/BA/Test to ensure non‑functional requirements (availability, throughput, RTO/RPO) are built‑in; support RFP/solutioning for AA TOPS initiatives. (Solution Architect responsibilities and leadership in airline domain JD.)
🧩 Skills Required
Core Platform & Cloud
- Kubernetes (cluster lifecycle, multi‑cluster, node pools, scheduling, policies), Docker; AKS/AWS/EKS/Azure constructs (VNets/VPCs, IAM/Entra
- GitOps with Argo CD (AppSets, sync waves, health checks, automated promotions), Argo Workflows for orchestration.
- Service Mesh (Istio/Linkerd), CNI (Calico/Cilium), Ingress (NGINX/Contour), API gateways, Secrets (Vault/Key Vault).
DevSecOps & IaC
- Terraform, ARM/Bicep or Crossplane; policy‑as‑code (OPA/Gatekeeper/Kyverno); supply‑chain security (Cosign, SBOMs).
- CI/CD (Jenkins/GitHub Actions) feeding GitOps; approvals & gated deployments per aviation compliance.
Observability & SRE
- Prometheus/Grafana, ELK/Loki, OpenTelemetry; alerting; SLO/SLA frameworks; chaos/resilience testing patternsAviation Domain
- Airline operations: Crew Scheduling, Flight Ops, Movement Control, and constraints (legality, rest, duty limits). (Domains and modules documented in TOPS onboarding and AA JD
- Regulatory frameworks: FAA/EASA; familiarity with BiTS security guidance referenced in TOPS.
Leadership & Collaboration
- Stakeholder management, cross‑functional leadership in Agile/SAFe programs; architecture governance for large‑scale modernization. (Leadership and SAFe noted in AA JD and Solution Architect doc
-
|