SRE & Platform Engineering Training Curriculum

Global Market Snapshot (SRE • Platform • DevOps)

Live job counts fluctuate daily. Below are conservative **open-role ranges** observed on Sep 6, 2025 (combined SRE/Platform/DevOps), plus trend sources.

US

~8k–15k+

High concentration in finance, SaaS, AI infra.

UK

~1k–2k+

London + remote first orgs.

EU (ex-UK)

~5k–10k+

DE, NL, FR, IE strong platform teams.

Canada

~1k–2k+

Toronto, Vancouver, Montreal banks & product.

Australia

~700–1,200+

SRE in fintech & media.

New Zealand

~80–150+

Auckland/Wellington product orgs.

Asia

~10k–18k+

IN, SG, JP cloud-native hiring.

Africa

~300–700+

Cloud telco & fintech hubs: ZA, NG, KE.

Why these ranges?

Market trend reports show ongoing DevOps/SRE demand growth and strong Kubernetes-skilled roles (DevOps, Platform, SRE comprise a large share of K8s postings). :contentReference[oaicite:0]{index=0}
Hiring insights & market sizes indicate North America & Europe as largest DevOps markets; growth remains double-digit through 2028–2032. :contentReference[oaicite:1]{index=1}
Representative country job boards for SRE/DevOps confirm active requisitions in UK & Canada (counts vary by day). :contentReference[oaicite:2]{index=2}
Macro employment outlooks (WEF) support continued tech-role expansion through 2030, with AI & platform roles accelerating infra demand. :contentReference[oaicite:3]{index=3}

These are snapshot estimates; use them as directional guidance when pitching ROI to learners or partners.

What recruiters expect (from LinkedIn & job boards)

KubernetesCloud (AWS/Azure/GCP) Terraform / IaCLinux / Bash CI/CD (Actions, GitLab, Jenkins)Observability (Prometheus, Grafana, ELK) SRE: SLO/SLI, Error Budgets, Incident MgmtNetworking & Security Platform Engineering: IDP, Backstage, Golden Paths

Track 1 — Junior SRE & Platform Engineer (6 Months)

Target: roles labeled “Junior SRE / Platform / DevOps” or “SRE (0–1 YOE)” with 1–3 years preferred.

Phase 1 (Weeks 1–8) Linux • Git • Python • Cloud Fundamentals • Containers

Outcomes

✔ Ship a hardened Linux VM, write shell utilities, and version in Git.
✔ Package & run services in Docker; publish to a registry.
✔ Deploy a basic app to AWS/GCP with IaC foundations.

Project A — “Prod-ish” Starter Stack

Compose a 3-service app (frontend, API, PostgreSQL) with healthchecks, structured logs, and Makefile automation.

Interview value: “Built a Dockerized 3-tier service with health probes, log JSON, and CI smoke tests.”

Content

Linux admin (users, systemd, journalctl), secure SSH, backups
Git flows (PRs, reviews), GitHub Actions basics
Python for ops (click/argparse, requests, boto3/gcloud)
Dockerfiles (multi-stage), Compose, image scanning
Cloud 101 (IAM, VPC/VNet, compute, storage, LB)

Phase 2 (Weeks 9–16) Kubernetes • Terraform • CI/CD • Observability

Outcomes

✔ Run a secure K8s cluster (kind/k3s/EKS/GKE); deploy via Helm.
✔ Provision cloud infra with Terraform & remote state.
✔ CI/CD: build-test-scan-deploy; env promos; feature flags.
✔ Observability: metrics/logs/traces; SLI panels.

Project B — “Hello Reliability” on K8s

Terraform a VPC + EKS/GKE; deploy app with Helm; add HPA; set up Prometheus/Grafana + Loki or ELK.

Interview value: “IaC’d a production-like cluster with autoscaling & dashboards around SLIs.”

Content

K8s: pods, services, ingresses, HPA, RBAC, secrets
Helm & kustomize; GitOps intro (Argo CD/Flux)
Terraform modules, workspaces, tfvars, backends
CI/CD patterns (Actions/GitLab/Jenkins), SBOM & image scans
Prometheus, Grafana, Alertmanager; ELK/Opensearch

Phase 3 (Weeks 17–24) SRE Practices • Incidents • Cost/Perf • Platform Basics

Outcomes

✔ Define SLIs/SLOs, error budgets, runbooks, on-call rotations.
✔ Performance & cost tuning; capacity planning.
✔ Platform engineering 101: golden paths & Backstage intro.

Capstone — “Mini Platform, Real Incidents”

Build a tiny IDP: Backstage catalog + templates to provision a golden-path service (scaffold repo, CI, Helm chart, alerts). Run a chaos day and publish a post-mortem.

Interview value: “Owned SLOs & post-mortems; shipped an internal template that cut service bootstrap to 15 minutes.”

Content

SRE: SLIs/SLOs, error budgets, incident command, blameless RCA
Perf & cost: autoscaling, right-sizing, spot/commit plans
Backstage basics; templating; IDP concepts & DX metrics

Track 2 — Intermediate SRE & Platform Engineer (9 Months)

Target: roles titled “SRE”, “Platform Engineer”, “DevOps/SRE” with 1–3 YOE or “Intermediate”.

Phase A (Months 1–3) Advanced Cloud • Networking • Security (DevSecOps)

Projects

P1 — Multi-Region Active/Active

Design & build blue/green + failover across 2 regions (AWS or Azure). SLO impact model; DR runbook with RTO/RPO evidence.

P2 — Supply-Chain Security Pipeline

End-to-end CI with SAST, dependency scans, image attestations (Sigstore/Cosign), policy gates (OPA/Conftest) and SBOMs.

Content

Cloud networking: VPC/VNet design, PrivateLink/Peering
Ingress, service mesh (Istio/Linkerd) & mTLS
Secrets mgmt (Vault/AWS Secrets Manager), KMS, IAM
Policy-as-code (OPA), artifact signing (Cosign), SBOM

Phase B (Months 4–6) Platform Engineering • IDP • Golden Paths • FinOps

Projects

P3 — Internal Developer Platform (IDP)

Backstage + Terraform + Argo CD to generate a “service-in-a-box” (repo, CI, container, K8s chart, alerts, SLO dashboard) in < 10 minutes.

P4 — FinOps & Perf Tuning

Right-size workloads, adopt spot/savings plans, and show a 25–40% cost reduction with unchanged SLOs.

Content

IDP patterns, platform APIs, service catalogs, scorecards
Golden paths & scaffolding (Backstage templates)
Multi-tenant clusters, quotas, PSP/PodSecurity admission
Cost showback/chargeback; perf/load testing at scale

Phase C (Months 7–9) Reliability at Scale • Chaos • Observability 2.0 • Leadership

Capstone (choose one)

C1 — SRE at Scale

Introduce error-budget policies org-wide; create RCA templates; implement incident tooling (PagerDuty/VictorOps) & post-incident reviews.

C2 — Chaos & Resilience

Adopt a chaos program (Litmus/Gremlin); validate autoscaling, timeouts, retry/backoff, circuit breakers; publish resilience scorecard.

C3 — Observability 2.0

OpenTelemetry traces + exemplars + RED/USE dashboards; lower MTTD/MTTR by 30% quarter-over-quarter.

Leadership & Comms

Run incident drills, PIRs, and executive briefings
Road-mapping with stakeholders; risk registers
Hiring screens & technical presentations

Portfolio & Interview Mapping

Project	You’ll Claim in Interviews	Maps to Market Skills
Project A — Prod-ish Starter Stack	“Hardened Linux + Docker multi-service, CI smoke tests, health checks.”	Linux, Docker, CI basics
Project B — K8s “Hello Reliability”	“Terraform + EKS/GKE, Helm, HPA, Prom/Grafana SLI dashboards.”	K8s, IaC, Observability
Capstone — Mini Platform	“Backstage IDP templates cut service bootstrapping to 15min; SLOs & RCAs.”	Platform Eng, SRE
P3 — Full IDP	“Self-service Golden Path: repo→CI→image→chart→deploy→alerts automated.”	Backstage, GitOps, DX
C2 — Chaos Program	“Resilience score +30%; MTTD/MTTR down 30%.”	Chaos, SRE metrics

Tip: keep repos public (sanitized), add architecture diagrams & runbooks; link dashboards with anonymized screenshots.

Suggested Certifications (Optional but Helpful)

Cloud

Foundational → Associate

AWS Cloud Practitioner / Azure Fundamentals; then AWS Developer/SysOps or Azure Admin.

Kubernetes

CKA/CKAD

Reinforces cluster operations & app delivery.

Security

Security+ / SCS / AZ-500

Backs up DevSecOps pipeline claims.

FAQ

Time commitment

~25–30 hrs/week (Junior), ~30–35 hrs/week (Intermediate).

Prerequisites

No prior IT experience. We start with Linux & Git and ramp quickly with hands-on labs.

Tooling stack

Linux, Git/GitHub, Docker, Kubernetes (kind/k3s/EKS/GKE), Terraform, Prometheus/Grafana, ELK/Opensearch, Backstage, Argo CD/Flux, Jenkins/GitHub Actions/GitLab CI.

Banner images: Unsplash (free). Swap with your brand assets as needed.

SRE & Platform Engineering Training Curriculum

SRE & Platform Engineer Training (6–9 Months)

Global Market Snapshot (SRE • Platform • DevOps)

What recruiters expect (from LinkedIn & job boards)

Track 1 — Junior SRE & Platform Engineer (6 Months)

Project A — “Prod-ish” Starter Stack

Project B — “Hello Reliability” on K8s

Capstone — “Mini Platform, Real Incidents”

Track 2 — Intermediate SRE & Platform Engineer (9 Months)

P1 — Multi-Region Active/Active

P2 — Supply-Chain Security Pipeline

P3 — Internal Developer Platform (IDP)

P4 — FinOps & Perf Tuning

C1 — SRE at Scale

C2 — Chaos & Resilience

C3 — Observability 2.0

Portfolio & Interview Mapping

Suggested Certifications (Optional but Helpful)

FAQ