SRE & Platform Engineer Training (6–9 Months)
Project-based tracks that map directly to LinkedIn & job-board skill demands for roles asking “1–3 years experience”.
Global Market Snapshot (SRE • Platform • DevOps)
Live job counts fluctuate daily. Below are conservative **open-role ranges** observed on Sep 6, 2025 (combined SRE/Platform/DevOps), plus trend sources.
- Market trend reports show ongoing DevOps/SRE demand growth and strong Kubernetes-skilled roles (DevOps, Platform, SRE comprise a large share of K8s postings). :contentReference[oaicite:0]{index=0}
- Hiring insights & market sizes indicate North America & Europe as largest DevOps markets; growth remains double-digit through 2028–2032. :contentReference[oaicite:1]{index=1}
- Representative country job boards for SRE/DevOps confirm active requisitions in UK & Canada (counts vary by day). :contentReference[oaicite:2]{index=2}
- Macro employment outlooks (WEF) support continued tech-role expansion through 2030, with AI & platform roles accelerating infra demand. :contentReference[oaicite:3]{index=3}
What recruiters expect (from LinkedIn & job boards)
Track 1 — Junior SRE & Platform Engineer (6 Months)
Target: roles labeled “Junior SRE / Platform / DevOps” or “SRE (0–1 YOE)” with 1–3 years preferred.
Phase 1 (Weeks 1–8)
Linux • Git • Python • Cloud Fundamentals • Containers
- ✔ Ship a hardened Linux VM, write shell utilities, and version in Git.
- ✔ Package & run services in Docker; publish to a registry.
- ✔ Deploy a basic app to AWS/GCP with IaC foundations.
Project A — “Prod-ish” Starter Stack
Compose a 3-service app (frontend, API, PostgreSQL) with healthchecks, structured logs, and Makefile automation.
Interview value: “Built a Dockerized 3-tier service with health probes, log JSON, and CI smoke tests.”- Linux admin (users, systemd, journalctl), secure SSH, backups
- Git flows (PRs, reviews), GitHub Actions basics
- Python for ops (click/argparse, requests, boto3/gcloud)
- Dockerfiles (multi-stage), Compose, image scanning
- Cloud 101 (IAM, VPC/VNet, compute, storage, LB)
Phase 2 (Weeks 9–16)
Kubernetes • Terraform • CI/CD • Observability
- ✔ Run a secure K8s cluster (kind/k3s/EKS/GKE); deploy via Helm.
- ✔ Provision cloud infra with Terraform & remote state.
- ✔ CI/CD: build-test-scan-deploy; env promos; feature flags.
- ✔ Observability: metrics/logs/traces; SLI panels.
Project B — “Hello Reliability” on K8s
Terraform a VPC + EKS/GKE; deploy app with Helm; add HPA; set up Prometheus/Grafana + Loki or ELK.
Interview value: “IaC’d a production-like cluster with autoscaling & dashboards around SLIs.”- K8s: pods, services, ingresses, HPA, RBAC, secrets
- Helm & kustomize; GitOps intro (Argo CD/Flux)
- Terraform modules, workspaces, tfvars, backends
- CI/CD patterns (Actions/GitLab/Jenkins), SBOM & image scans
- Prometheus, Grafana, Alertmanager; ELK/Opensearch
Phase 3 (Weeks 17–24)
SRE Practices • Incidents • Cost/Perf • Platform Basics
- ✔ Define SLIs/SLOs, error budgets, runbooks, on-call rotations.
- ✔ Performance & cost tuning; capacity planning.
- ✔ Platform engineering 101: golden paths & Backstage intro.
Capstone — “Mini Platform, Real Incidents”
Build a tiny IDP: Backstage catalog + templates to provision a golden-path service (scaffold repo, CI, Helm chart, alerts). Run a chaos day and publish a post-mortem.
Interview value: “Owned SLOs & post-mortems; shipped an internal template that cut service bootstrap to 15 minutes.”- SRE: SLIs/SLOs, error budgets, incident command, blameless RCA
- Perf & cost: autoscaling, right-sizing, spot/commit plans
- Backstage basics; templating; IDP concepts & DX metrics
Track 2 — Intermediate SRE & Platform Engineer (9 Months)
Target: roles titled “SRE”, “Platform Engineer”, “DevOps/SRE” with 1–3 YOE or “Intermediate”.
Phase A (Months 1–3)
Advanced Cloud • Networking • Security (DevSecOps)
P1 — Multi-Region Active/Active
Design & build blue/green + failover across 2 regions (AWS or Azure). SLO impact model; DR runbook with RTO/RPO evidence.
P2 — Supply-Chain Security Pipeline
End-to-end CI with SAST, dependency scans, image attestations (Sigstore/Cosign), policy gates (OPA/Conftest) and SBOMs.
- Cloud networking: VPC/VNet design, PrivateLink/Peering
- Ingress, service mesh (Istio/Linkerd) & mTLS
- Secrets mgmt (Vault/AWS Secrets Manager), KMS, IAM
- Policy-as-code (OPA), artifact signing (Cosign), SBOM
Phase B (Months 4–6)
Platform Engineering • IDP • Golden Paths • FinOps
P3 — Internal Developer Platform (IDP)
Backstage + Terraform + Argo CD to generate a “service-in-a-box” (repo, CI, container, K8s chart, alerts, SLO dashboard) in < 10 minutes.
P4 — FinOps & Perf Tuning
Right-size workloads, adopt spot/savings plans, and show a 25–40% cost reduction with unchanged SLOs.
- IDP patterns, platform APIs, service catalogs, scorecards
- Golden paths & scaffolding (Backstage templates)
- Multi-tenant clusters, quotas, PSP/PodSecurity admission
- Cost showback/chargeback; perf/load testing at scale
Phase C (Months 7–9)
Reliability at Scale • Chaos • Observability 2.0 • Leadership
C1 — SRE at Scale
Introduce error-budget policies org-wide; create RCA templates; implement incident tooling (PagerDuty/VictorOps) & post-incident reviews.
C2 — Chaos & Resilience
Adopt a chaos program (Litmus/Gremlin); validate autoscaling, timeouts, retry/backoff, circuit breakers; publish resilience scorecard.
C3 — Observability 2.0
OpenTelemetry traces + exemplars + RED/USE dashboards; lower MTTD/MTTR by 30% quarter-over-quarter.
- Run incident drills, PIRs, and executive briefings
- Road-mapping with stakeholders; risk registers
- Hiring screens & technical presentations
Portfolio & Interview Mapping
| Project | You’ll Claim in Interviews | Maps to Market Skills |
|---|---|---|
| Project A — Prod-ish Starter Stack | “Hardened Linux + Docker multi-service, CI smoke tests, health checks.” | Linux, Docker, CI basics |
| Project B — K8s “Hello Reliability” | “Terraform + EKS/GKE, Helm, HPA, Prom/Grafana SLI dashboards.” | K8s, IaC, Observability |
| Capstone — Mini Platform | “Backstage IDP templates cut service bootstrapping to 15min; SLOs & RCAs.” | Platform Eng, SRE |
| P3 — Full IDP | “Self-service Golden Path: repo→CI→image→chart→deploy→alerts automated.” | Backstage, GitOps, DX |
| C2 — Chaos Program | “Resilience score +30%; MTTD/MTTR down 30%.” | Chaos, SRE metrics |
Tip: keep repos public (sanitized), add architecture diagrams & runbooks; link dashboards with anonymized screenshots.
Suggested Certifications (Optional but Helpful)
FAQ
Time commitment
~25–30 hrs/week (Junior), ~30–35 hrs/week (Intermediate).
Prerequisites
No prior IT experience. We start with Linux & Git and ramp quickly with hands-on labs.
Tooling stack
Linux, Git/GitHub, Docker, Kubernetes (kind/k3s/EKS/GKE), Terraform, Prometheus/Grafana, ELK/Opensearch, Backstage, Argo CD/Flux, Jenkins/GitHub Actions/GitLab CI.
Banner images: Unsplash (free). Swap with your brand assets as needed.
