Skip to content

effieksa/platform-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Platform Engineering — Internal Developer Platform

Backstage ArgoCD Tekton Crossplane Terraform Kubernetes

A production-grade Internal Developer Platform (IDP) that reduces the time for a developer to get a new production-ready microservice from weeks to under 5 minutes. Built on Backstage (service catalog + golden path templates), Crossplane (self-service infrastructure), ArgoCD (GitOps), and Tekton (Kubernetes-native CI). The platform codifies every engineering standard — security scanning, observability, GitOps workflows — into templates so following best practices is the default, not an afterthought.


Table of Contents

  1. The Problem This Solves
  2. Architecture Overview
  3. Repository Structure
  4. Backstage — Internal Developer Portal
  5. Crossplane — Self-Service Infrastructure
  6. ArgoCD — GitOps Deployments
  7. Tekton — Kubernetes-Native CI
  8. Infrastructure
  9. Developer Experience — New Service in 5 Minutes
  10. Platform Maturity Scorecards
  11. Deployment Guide
  12. Runbook
  13. Troubleshooting

The Problem This Solves

Without an Internal Developer Platform, engineering organizations accumulate toil:

  • Onboarding a new service takes days — set up CI/CD, configure monitoring, write Dockerfiles, set up ArgoCD applications, create PagerDuty services, register in Backstage (if it exists), request a database via ticket, wait for ops approval
  • Platform standards are unevenly adopted — some services have security scanning, others don't. Some have runbooks, most don't. There's no visibility into which services are following best practices
  • Knowledge is tribal — how do I deploy to staging? Which secrets should I use? What's the on-call rotation? The answers live in Slack messages and individual developers' heads
  • Infrastructure provisioning is a bottleneck — developers open tickets to ops for RDS instances, S3 buckets, SQS queues. Tickets sit for days. Services get deployed without the infrastructure they need

This platform solves all four:

  1. A Backstage template creates a complete production-ready service in < 5 minutes
  2. Standards are baked into the templates — security scanning is not optional
  3. Backstage is the single source of truth — ownership, docs, dependencies, health status
  4. Crossplane lets developers self-serve infrastructure via a Kubernetes CRD

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    INTERNAL DEVELOPER PLATFORM                              │
│                                                                             │
│  DEVELOPER PORTAL                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  Backstage                                                          │   │
│  │                                                                     │   │
│  │  Service Catalog    Software Templates    TechDocs                  │   │
│  │  ─────────────────  ──────────────────    ────────                  │   │
│  │  Every service,     Golden path for       Auto-generated           │   │
│  │  API, resource,     new services,         docs from markdown       │   │
│  │  and team in        libraries, and        in each repo             │   │
│  │  one place          infra requests                                 │   │
│  │                                                                     │   │
│  │  Plugins: ArgoCD · Kubernetes · GitHub Actions · PagerDuty · Cost  │   │
│  └──────────────────────────────────┬──────────────────────────────────┘   │
│                                     │ Scaffolder actions                   │
│                                     ▼                                       │
│  GOLDEN PATH (what a template creates automatically)                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │  GitHub  │  │  Tekton  │  │  ArgoCD  │  │Crossplane│  │PagerDuty │    │
│  │  Repo    │  │  Pipeline│  │  App     │  │  Claim   │  │  Service │    │
│  │+ branch  │  │ 10 stages│  │  GitOps  │  │ (RDS,    │  │+ escalat-│    │
│  │  protect │  │  CI      │  │  CD      │  │  S3, SQS)│  │  ion     │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
│                     │                │                                      │
│  CI LAYER           │                │  CD LAYER                           │
│  ┌──────────────────▼─┐     ┌────────▼────────────────────┐               │
│  │  Tekton Pipelines  │     │  ArgoCD + ApplicationSets   │               │
│  │                    │     │                             │               │
│  │  Clone → Test →    │     │  Matrix generator:          │               │
│  │  SAST → OWASP →    │     │  services × environments    │               │
│  │  Build → Trivy →   │     │  Auto-sync dev/staging      │               │
│  │  Push → Update     │     │  Manual approval for prod   │               │
│  │  GitOps repo       │     │  RBAC: devs sync, platform  │               │
│  └────────────────────┘     │  team approves prod         │               │
│                             └─────────────────────────────┘               │
│                                                                             │
│  SELF-SERVICE INFRASTRUCTURE                                                │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │  Crossplane                                                          │  │
│  │                                                                      │  │
│  │  Developer writes:          Platform provisions:                     │  │
│  │  PostgreSQLInstance claim → RDS (encrypted, private, Multi-AZ)      │  │
│  │  S3Bucket claim         → S3 (encrypted, versioned, lifecycle)      │  │
│  │  SQSQueue claim         → SQS (encrypted, DLQ configured)           │  │
│  │                                                                      │  │
│  │  Composition enforces all security defaults — devs can't opt out    │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Repository Structure

platform-engineering/
│
├── README.md
│
├── backstage/
│   ├── app-config/
│   │   └── app-config.production.yaml    # Full Backstage config: auth, catalog,
│   │                                      # plugins, TechDocs, Kubernetes, ArgoCD
│   ├── catalog/
│   │   └── all-components.yaml           # Catalog: Domains, Systems, Components,
│   │                                      # APIs, Resources, Groups
│   └── templates/
│       └── microservice/
│           └── template.yaml             # Golden path: Spring Boot microservice
│                                          # (9 scaffolder steps, creates everything)
│
├── crossplane/
│   ├── xrds/
│   │   └── postgresql-xrd.yaml           # CompositeResourceDefinition — the API
│   │                                      # developers see (4 simple fields)
│   ├── compositions/
│   │   └── postgresql-composition.yaml   # How to fulfill a claim: subnet group,
│   │                                      # security group, parameter group, RDS
│   └── claims/
│       └── order-service-db-claim.yaml   # Example developer claim
│
├── argocd/
│   └── applicationsets/
│       └── services-applicationset.yaml  # Matrix generator: services × environments
│                                          # + AppProject RBAC definitions
│
├── tekton/
│   ├── pipelines/
│   │   └── microservice-ci-pipeline.yaml # 10-stage pipeline: clone → test → SAST
│   │                                      # → OWASP → build → Docker → Trivy →
│   │                                      # push → GitOps update → notify
│   └── triggers/
│       └── github-trigger.yaml           # EventListener + TriggerBinding +
│                                          # TriggerTemplate for GitHub webhooks
│
└── terraform/
    └── environments/prod/
        ├── main.tf                        # VPC, EKS, Backstage (RDS+S3),
        │                                  # ArgoCD, Tekton, Crossplane,
        │                                  # External Secrets Operator
        └── variables.tf

Backstage — Internal Developer Portal

Service Catalog

The Backstage catalog is the source of truth for every software component in the engineering organization. Every service registers a catalog-info.yaml in its repository. The catalog tracks:

Entity types:

  • Domain — Business domain (Payments, Identity, Platform). Services belong to domains.
  • System — Group of related services (Order Management = order-service + payment-service + inventory-service). Provides architectural context.
  • Component — Individual deployable unit (microservice, library, data pipeline). The most common entity type.
  • API — External interface of a component. Allows consumers to browse available APIs before building integrations.
  • Resource — Infrastructure (RDS instances, S3 buckets, Kafka topics). Linked to Crossplane claims for provisioning traceability.
  • Group — Engineering team. Every component has an owner team.

Annotations enable plugin integrations:

annotations:
  argocd/app-name: order-service-production          # ArgoCD plugin shows deploy status
  backstage.io/kubernetes-id: order-service          # K8s plugin shows pod status
  pagerduty.com/service-id: P1234AB                  # PagerDuty plugin shows incidents
  grafana/dashboard-url: https://grafana.../d/...    # Direct link to dashboard
  security.company.com/pci-scope: "true"             # Scorecard: PCI compliance check

Auto-discovery: The github-discovery location type scans all GitHub repositories for catalog-info.yaml files. New services automatically appear in the catalog without manual registration.

Software Templates (Golden Paths)

The Spring Boot microservice template creates a production-ready service in under 5 minutes through 9 automated steps:

Step Action Result
1 fetch:template Clones skeleton with platform integrations
2 publish:github Creates repo with branch protection + CODEOWNERS
3 catalog:register Registers service in Backstage catalog
4 publish:github:file Creates Tekton pipeline in platform-gitops
5 argocd:create-resources Creates ArgoCD application
6 publish:github:file (conditional) Creates Crossplane RDS claim
7 pagerduty:service:create Creates PagerDuty service with escalation policy

The developer inputs 4 fields:

  1. Service name
  2. Owning team
  3. Do you need a database? (yes/no)
  4. Is this PCI-scoped? (yes/no)

Everything else — Tekton pipeline, ArgoCD application, Grafana dashboard, PagerDuty service, security controls — is created automatically.

Why this matters: Without the template, a developer might skip security scanning because they don't know how to configure Trivy. They might skip PagerDuty integration because it's tedious. By automating everything in the template, the secure, observable, well-operated service is the default — not the exception.

TechDocs

TechDocs converts markdown in each service repository into a searchable documentation portal. The backstage.io/techdocs-ref annotation tells Backstage where to find the docs. A CI job in each repo builds the docs (using MkDocs) and publishes them to the S3 bucket. Backstage serves them from S3 at backstage.internal.company.com/docs.

Result: engineers find documentation in the same place they find everything else — no more hunting through wikis, Confluence, and scattered Google Docs.

Plugins

Plugin What it shows
Kubernetes Pod status, resource usage, recent events for each catalog component
ArgoCD Deployment sync status, last sync time, diff from desired state
GitHub Actions CI pipeline history and status
PagerDuty Current on-call, open incidents, recent alerts
Cost Insights AWS cost per team, per service, with trendlines
SonarQube Code quality metrics (coverage, bugs, vulnerabilities)

Crossplane — Self-Service Infrastructure

The XRD/Composition/Claim model

XRD (CompositeResourceDefinition): Defines the API that developers see. The PostgreSQL XRD exposes 4 fields: storageGB, instanceClass, multiAZ, backupRetentionDays. The developer never sees VPC IDs, KMS key ARNs, parameter groups, or subnet groups.

Composition: The platform team's implementation. Defines exactly which AWS resources to create and with what defaults. The PostgreSQL composition creates: a DB subnet group (private subnets only), a security group, a parameter group (with rds.force_ssl=1, connection logging, statement timeouts enforced), and the RDS instance (encrypted, private, automated backups, enhanced monitoring, KMS encryption). Developers cannot override these security defaults — they're enforced at the composition level.

Claim: What developers write. 6 lines of YAML:

apiVersion: platform.company.com/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: order-service-db
  namespace: production
spec:
  parameters:
    storageGB: 50
    instanceClass: db.t4g.large
    multiAZ: true
    backupRetentionDays: 14
  writeConnectionSecretToRef:
    name: order-service-db-secret
    namespace: production

Crossplane provisions the full RDS stack (~5 minutes) and writes the connection details (endpoint, username, password) to the Kubernetes secret. The application reads the secret as environment variables.

Why Crossplane over Terraform for self-service:

  • Developers don't need Terraform, S3 backends, or AWS IAM permissions
  • Claims are committed to the service repo and deployed via ArgoCD — full audit trail
  • Kubernetes RBAC controls who can create which resource types
  • Crossplane continuously reconciles — if someone manually modifies the RDS instance, Crossplane detects drift and corrects it
  • Platform team controls the Composition centrally — one change enforces a security policy across every database in the organization

ArgoCD — GitOps Deployments

ApplicationSet matrix generator

Instead of creating one ArgoCD Application per service per environment (which quickly becomes hundreds of Applications to manage), the ApplicationSet uses a matrix generator:

services × environments = Applications

order-service × [dev, staging, production] = 3 Applications
payment-service × [dev, staging, production] = 3 Applications
...

Services are auto-discovered from the services/*/k8s directory structure in the gitops repo. When a new service is scaffolded, it gets ArgoCD applications in all three environments automatically.

Environment sync policies:

  • Dev/Staging: automated — every commit to the gitops repo syncs immediately
  • Production: manual — requires a human to click sync in the ArgoCD UI. This is the gate between staging and production.

RBAC model

Two ArgoCD projects enforce team boundaries:

microservices project:

  • Developers (backend-team, payments-team) can get and sync applications but cannot delete
  • Release managers (platform-team) have full control
  • Applications can only deploy to dev/staging/production namespaces — cannot touch platform namespaces like argocd, monitoring, or crossplane-system
  • Cannot manage Kubernetes Secrets (External Secrets Operator manages these)

platform project:

  • Platform team only — full cluster access
  • Manages Backstage, Tekton, Crossplane, monitoring stack

GitOps workflow

Developer pushes to main branch
  → Tekton CI runs (test, scan, build, push)
  → Tekton updates image tag in platform-gitops repo
  → ArgoCD detects change in staging Application
  → ArgoCD syncs staging cluster
  → Smoke tests run
  → Release manager reviews staging
  → Release manager syncs production Application
  → Production deploy completes
  → ArgoCD notifies Slack #deployments

Every cluster change traces back to a git commit. git log is the deployment history. Rolling back is git revert.


Tekton — Kubernetes-Native CI

Why Tekton over Jenkins

Jenkins Tekton
Infrastructure Dedicated server + agents Runs as Pods in EKS
Scaling Fixed agent pool Scales to zero (no idle cost)
Environment Shared, stateful agents Ephemeral Pods (fresh every run)
Configuration Groovy DSL in Jenkinsfile Kubernetes-native YAML
Security Jenkins credentials store Kubernetes Secrets + IRSA
RBAC Jenkins roles Kubernetes RBAC

Pipeline stages

The 10-stage golden path CI pipeline:

Stage Tool Gate
Clone git-clone Task
Unit Tests Maven + JaCoCo Fails if coverage < 80%
SAST SonarQube Fails if quality gate fails
OWASP Dependency Check dependency-check Fails if CVSS ≥ 7.0
Build JAR Maven
Docker Build Kaniko (no daemon)
Trivy Scan Trivy Fails on CRITICAL CVE
Push to ECR Kaniko Main/develop branches only
Update GitOps Repo yq + git Triggers ArgoCD staging sync
Notify Slack webhook Always (success or failure)

Kaniko builds Docker images without a Docker daemon — no privileged pods needed. The Kaniko executor Pod reads the Dockerfile, builds the image layer by layer, and pushes directly to ECR.

Parallel execution: OWASP dependency check and unit tests run in parallel (both runAfter: [clone]), reducing total pipeline time.

GitOps update step: After a successful build, Tekton updates the image tag in the platform-gitops repository. ArgoCD detects this change and syncs the staging cluster. This is the bridge between CI (Tekton) and CD (ArgoCD) — the GitOps repo is the contract between them.

Webhook trigger

GitHub webhooks POST to the Tekton EventListener on every push. The EventListener:

  1. Validates the HMAC signature (prevents forgery)
  2. CEL-filters to only main and develop branches
  3. Extracts service name from repository name
  4. Creates a PipelineRun resource

Developers don't configure CI — pushing to main just works.


Infrastructure

EKS cluster: Platform tooling (Backstage, ArgoCD, Tekton, Crossplane) runs on a dedicated EKS cluster separate from application workloads. This prevents platform tooling from competing with production workloads for resources, and limits blast radius — a misbehaving application workload cannot affect the CI/CD platform.

Backstage persistence:

  • PostgreSQL on RDS (encrypted, Multi-AZ in production) — stores catalog entities, user sessions, scaffolder runs
  • Redis (ElastiCache) — session cache for Backstage's backend
  • S3 — TechDocs static assets (served via Backstage's TechDocs publisher)

External Secrets Operator: All application secrets (database credentials, API keys, OAuth tokens) are stored in AWS Secrets Manager and synced into Kubernetes Secrets by ESO. Secrets are never stored in git or in Kubernetes YAML files.

TechDocs S3 lifecycle: TechDocs assets are KMS-encrypted and blocked from public access. Only the Backstage service account's IRSA role can read the bucket.


Developer Experience — New Service in 5 Minutes

This is what the developer actually does:

1. Open Backstage → Create → Spring Boot Microservice
2. Fill in 4 fields: name, owner, needs database?, PCI scoped?
3. Click "Create"
4. Wait ~2 minutes
5. Get output links:
   - GitHub repo (with branch protection, CODEOWNERS, skeleton code)
   - Backstage catalog page
   - ArgoCD application (already deployed to dev)
   - PagerDuty service
6. git clone the repo
7. ./gradlew bootRun → service is running locally
8. git commit && git push → Tekton CI runs automatically

Compare to without the platform:

  • Create GitHub repo → 5 minutes
  • Set up Dockerfile → 30 minutes (if they know how to write a distroless multi-stage build)
  • Set up Tekton/Jenkins pipeline → 2 hours
  • Set up ArgoCD application → 30 minutes
  • Set up Grafana dashboard → 1 hour
  • Open ticket for RDS → 2 days wait
  • Set up PagerDuty → 30 minutes
  • Register in Backstage → 20 minutes
  • Write catalog-info.yaml → 20 minutes

Total without platform: ~3 days. With platform: 5 minutes.


Platform Maturity Scorecards

Backstage scorecards (via the TechInsights plugin) track engineering maturity across every service. Each service gets a score across dimensions:

Dimension Checks Weight
Reliability Has PagerDuty service, has runbook, has health check endpoint High
Observability Has Grafana dashboard, has Prometheus metrics, has distributed tracing High
Security Security scanning in CI, no critical CVEs, has DAST results Critical
Documentation TechDocs present, README exists, API contract documented Medium
Ownership Has owner team, owner team has oncall, has CODEOWNERS High
Operations Last deployed < 90 days, no failing ArgoCD sync Medium

Scores are visible on every catalog page. Teams with low scores get automated Slack nudges. Platform team reviews the org-wide scorecard monthly to identify systemic gaps.


Deployment Guide

# 1. Deploy platform infrastructure
cd terraform/environments/prod
terraform init
terraform plan -out=tfplan
terraform apply tfplan

# 2. Configure kubectl for the new cluster
aws eks update-kubeconfig --name platform-prod --region us-east-1

# 3. Deploy ArgoCD ApplicationSets
kubectl apply -f argocd/applicationsets/services-applicationset.yaml

# 4. Deploy Tekton resources
kubectl apply -f tekton/pipelines/
kubectl apply -f tekton/triggers/

# 5. Deploy Crossplane XRDs and Compositions
kubectl apply -f crossplane/xrds/
kubectl apply -f crossplane/compositions/

# 6. Create GitHub webhook for Tekton EventListener
LISTENER_URL=$(kubectl get svc tekton-triggers-eventlistener -n tekton-pipelines \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "Add GitHub webhook to: http://${LISTENER_URL}:8080"

# 7. Verify Backstage is running
kubectl get pods -n backstage
kubectl logs -n backstage deployment/backstage

# 8. Access the portal
kubectl port-forward -n backstage svc/backstage 3000:80
open http://localhost:3000

Runbook

Register a new service in the catalog manually

# Add catalog-info.yaml to the service repo (if not scaffolded via Backstage)
cat > catalog-info.yaml << 'EOF'
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: my-service
  description: "My service"
  annotations:
    github.com/project-slug: company/my-service
    argocd/app-name: my-service-production
    backstage.io/kubernetes-id: my-service
spec:
  type: service
  lifecycle: production
  owner: group:my-team
EOF
git add catalog-info.yaml && git commit -m "feat: add Backstage catalog entry"

Promote a service to production (GitOps)

# 1. Verify staging is healthy
kubectl get pods -n staging -l app=order-service

# 2. Check ArgoCD staging status
argocd app get order-service-staging

# 3. Sync production in ArgoCD UI, or via CLI
argocd app sync order-service-production --revision HEAD
argocd app wait order-service-production --health --timeout 300

# 4. Verify production is healthy
kubectl rollout status deployment/order-service -n production

Debug a failed Tekton pipeline

# List recent pipeline runs
tkn pipelinerun list -n tekton-pipelines --limit 10

# Get logs from a failed run
tkn pipelinerun logs <run-name> -n tekton-pipelines -f

# Get logs from a specific failed task
tkn taskrun logs <taskrun-name> -n tekton-pipelines

# Re-run a failed pipeline
tkn pipeline start microservice-ci-pipeline \
  -p git-url=https://github.com/company/order-service \
  -p git-revision=main \
  -p service-name=order-service \
  -w name=source,claimName=order-service-source-pvc \
  -n tekton-pipelines

Check Crossplane claim status

# List all PostgreSQL claims
kubectl get postgresqlinstance -A

# Describe a specific claim (shows Crossplane events)
kubectl describe postgresqlinstance order-service-db -n production

# Get the connection secret
kubectl get secret order-service-db-secret -n production -o jsonpath='{.data}' | \
  base64 -d

# Check managed resources
kubectl get managed | grep order-service

Troubleshooting

Backstage catalog not showing new services:

  • Check the github-discovery location is configured in app-config.yaml
  • Force a catalog refresh: Settings → Backstage → Catalog → Refresh
  • Check Backstage logs: kubectl logs -n backstage deployment/backstage

ArgoCD Application stuck in Progressing:

# Check application events
argocd app get <app-name> --show-operation

# Check Kubernetes events
kubectl get events -n production --sort-by='.lastTimestamp' | tail -20

# Force a hard refresh
argocd app get <app-name> --hard-refresh

Crossplane claim stuck in Creating:

# Check Crossplane controller logs
kubectl logs -n crossplane-system deployment/crossplane

# Check the composite resource
kubectl describe xpostgresqlinstance -A

# Check managed resources for errors
kubectl get managed -o wide | grep -v Ready

Tekton EventListener not triggering:

# Check EventListener pod is running
kubectl get pods -n tekton-pipelines -l eventlistener=github-event-listener

# Test webhook manually
curl -X POST http://LISTENER_URL:8080 \
  -H "X-GitHub-Event: push" \
  -H "X-Hub-Signature-256: sha256=..." \
  -d '{"ref":"refs/heads/main","repository":{"name":"order-service","clone_url":"..."}}'

# Check trigger interceptor logs
kubectl logs -n tekton-pipelines -l app.kubernetes.io/component=interceptors

License

MIT

About

Internal Developer Platform reducing new service onboarding from days to 5 minutes: Backstage service catalog with golden path scaffolder templates, Crossplane self-service infrastructure, ArgoCD ApplicationSet GitOps with matrix generated deployments across environments, and Tekton Kubernetes-native CI with 10-stage security gated pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages