CI/CD Cheatsheet Cheatsheet

⚡

CI/CD Concepts

FUNDAMENTAL

Pipeline Stages

Stage	Purpose	Tools
Source	Trigger on code change	Git push, webhook, PR
Build	Compile, bundle, resolve deps	npm, Maven, Gradle, Docker
Test	Unit, integration, E2E tests	Jest, PyTest, Cypress
Security Scan	SAST, DAST, dependency audit	SonarQube, Trivy, Snyk
Artifact	Package and store build output	Docker image, JAR, ZIP
Deploy (Staging)	Deploy to staging environment	K8s, ECS, Cloud Run
Integration Test	End-to-end validation	Postman, k6, Playwright
Deploy (Prod)	Release to production	Blue-Green, Canary, Rolling
Monitor	Observe health and metrics	Prometheus, Datadog, PagerDuty

CI vs CD vs CD

Concept	Stands For	Focus
CI	Continuous Integration	Merge code frequently; auto-build & test
CD (Delivery)	Continuous Delivery	Artifact always ready to deploy
CD (Deployment)	Continuous Deployment	Auto-deploy every passing build

💡

Rule of thumb: If you have a manual approval gate before production, you have Continuous Delivery. If every commit that passes tests goes straight to prod, you have Continuous Deployment.

pipeline-overview.yml

# ── Ideal CI/CD Pipeline Overview ──
stages:
  - source        # Git checkout, trigger on push/PR
  - build         # Compile, bundle, Docker build
  - test          # Unit + Integration tests
  - security      # SAST, dependency scan, license check
  - artifact      # Push image to registry, store JAR
  - deploy-stg    # Deploy to staging environment
  - smoke-test    # Basic health & sanity checks
  - deploy-prod   # Production deployment (gate)
  - verify        # Post-deploy validation + rollback check

# ── Key Principles ──
# 1. Fail fast — run cheap tests first
# 2. Parallelize independent stages
# 3. Cache dependencies between runs
# 4. Immutable artifacts — build once, deploy everywhere
# 5. Environment parity — prod-like staging
# 6. Automated rollback on failure

Pipeline Metrics

Lead TimeTime from commit to production deploy

Deployment FrequencyHow often code ships to production

MTTRMean Time To Recovery from a failed deploy

Change Failure Rate% of deploys causing incidents

Build TimeDuration of a single pipeline run

Flaky Test Rate% of tests that fail intermittently

DORA Metrics Targets

Metric	Elite	High	Medium	Low
Deploy frequency	On demand	<1 day	1 wk–1 mo	>1 mo
Lead time	<1 hr	<1 day	1 day–1 wk	>1 mo
MTTR	<1 hr	<1 day	<1 wk	>1 wk
Change failure	0–15%	16–30%	31–45%	>45%

🐙

GitHub Actions

PLATFORM

.github/workflows/ci.yml

# ── GitHub Actions: Full CI/CD Workflow ──
name: CI/CD Pipeline
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: '20'
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # ── Job 1: Lint & Type Check ──
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck

  # ── Job 2: Test ──
  test:
    needs: lint
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [18, 20, 22]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: 'npm'
      - run: npm ci
      - run: npm test -- --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

  # ── Job 3: Build & Push Docker Image ──
  build:
    needs: test
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/metadata-action@v5
        id: meta
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=semver,pattern={{version}}
      - uses: docker/build-push-action@v5
        with:
          context: .
          push: ${{ github.ref == 'refs/heads/main' }}
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ── Job 4: Deploy to Staging ──
  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to staging
        run: |
          echo "Deploying ${{ needs.build.outputs.image-tag }} to staging"
          # kubectl set image deployment/app app=${{ needs.build.outputs.image-tag }}

  # ── Job 5: Deploy to Production ──
  deploy-production:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to production
        run: |
          echo "Deploying ${{ needs.build.outputs.image-tag }} to production"

.github/workflows/security.yml

# ── GitHub Actions: Security Scanning ──
name: Security Pipeline
on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 6 * * 1'  # Weekly Monday 6 AM

permissions:
  security-events: write
  actions: read
  contents: read

jobs:
  dependency-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/dependency-review-action@v4
        with:
          fail-on-severity: moderate
          deny-licenses: GPL-3.0, AGPL-3.0

  codeql-sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: github/codeql-action/init@v3
        with:
          languages: javascript-typescript, python
      - uses: github/codeql-action/analyze@v3

  container-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'ghcr.io/org/app:latest'
          severity: 'CRITICAL,HIGH'
          format: 'sarif'
          output: 'trivy-results.sarif'
      - uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: 'trivy-results.sarif'

Key GitHub Actions Concepts

Concept	Description
Workflow	Automated process defined in .github/workflows/
Event	Trigger: push, pull_request, schedule, workflow_dispatch
Job	Group of steps in one runner (or self-hosted)
Step	Single action or shell command
Action	Reusable unit (uses: action@version)
Runner	GitHub-hosted (linux/mac/win) or self-hosted
Artifact	Files persisted between jobs
Environment	Named deployment target with protection rules
Secret	Encrypted env vars (repository/org level)
Matrix	Run job with variable combinations

Common Actions

Action	Purpose
actions/checkout@v4	Clone repository
actions/setup-node@v4	Install Node.js (with cache)
actions/setup-python@v5	Install Python (with cache)
actions/setup-java@v4	Install JDK (with cache)
docker/build-push-action@v5	Build & push Docker image
docker/login-action@v3	Log in to container registry
actions/cache@v4	Cache dependencies
actions/upload-artifact@v4	Upload build artifacts
actions/download-artifact@v4	Download artifacts
peaceiris/actions-gh-pages@v3	Deploy to GitHub Pages

secrets

# ── Managing GitHub Secrets ──
# CLI (GitHub CLI)
gh secret set DOCKER_TOKEN --body "dckr_pat_xxx"
gh secret set AWS_ACCESS_KEY_ID --body "AKIAxxx"
gh secret set KUBE_CONFIG --body "$(cat ~/.kube/config | base64)"

# List secrets
gh secret list
gh secret list --env production

# Delete a secret
gh secret delete OLD_SECRET

# Organization-level secrets
gh secret set REGISTRY_TOKEN --org my-org --body "xxx"

# Environment secrets (with protection rules)
gh secret set PROD_DB_URL --env production --body "postgres://..."

# ── secrets in workflows ──
# ${{ secrets.MY_SECRET }}            # Basic usage
# ${{ secrets.GITHUB_TOKEN }}         # Auto-provided, no setup needed
# ${{ vars.MY_VARIABLE }}             # Non-secret config variable
# ${{ secrets.MY_SECRET }}            # Masked in logs automatically

⚠️

Always use GITHUB_TOKENover personal access tokens when possible. It's auto-provided, scoped to the repository, and expires with the workflow. Only use PATs for cross-repo or organization-level access.

🔧

Jenkins

SELF-HOSTED

Jenkinsfile

// ── Jenkins Declarative Pipeline ──
pipeline {
  agent any

  environment {
    DOCKER_CREDENTIALS = credentials('docker-hub-creds')
    REGISTRY = 'registry.example.com'
    IMAGE = '${REGISTRY}/myapp'
    GIT_REPO = 'git@github.com:org/myapp.git'
  }

  tools {
    nodejs 'NodeJS-20'
    docker 'docker-latest'
  }

  options {
    timeout(time: 30, unit: 'MINUTES')
    retry(3)
    buildDiscarder(logRotator(numToKeepStr: '20'))
    disableConcurrentBuilds()
    timestamps()
    ansiColor('xterm')
  }

  triggers {
    githubPush()
    pollSCM('H/5 * * * *')
    cron('H 2 * * *')  // Nightly build at ~2 AM
  }

  stages {
    stage('Checkout') {
      steps {
        git branch: 'main', url: env.GIT_REPO
        sh 'git log --oneline -5'
      }
    }

    stage('Install Dependencies') {
      steps {
        sh 'npm ci --prefer-offline'
      }
    }

    stage('Lint & Format') {
      steps {
        sh 'npm run lint'
        sh 'npm run format:check'
      }
    }

    stage('Unit Tests') {
      steps {
        sh 'npm test -- --coverage --ci --forceExit'
      }
      post {
        always {
          junit 'reports/test-results.xml'
          publishHTML(target: [
            reportDir: 'coverage',
            reportFiles: 'index.html',
            reportName: 'Coverage Report',
            keepAll: true
          ])
        }
        failure {
          echo 'Tests failed! Notifying team...'
          emailext subject: 'Build Failed: ${JOB_NAME}',
            body: '${BUILD_URL}',
            to: 'team@example.com'
        }
      }
    }

    stage('Build Docker Image') {
      steps {
        script {
          docker.build("${IMAGE}:${BUILD_NUMBER}")
          docker.build("${IMAGE}:latest")
        }
      }
    }

    stage('Push to Registry') {
      steps {
        script {
          docker.withRegistry('https://${REGISTRY}', 'docker-hub-creds') {
            docker.image("${IMAGE}:${BUILD_NUMBER}").push()
            docker.image("${IMAGE}:latest").push()
          }
        }
      }
    }

    stage('Deploy to Staging') {
      steps {
        sh '''
          kubectl set image deployment/myapp \
            myapp=${IMAGE}:${BUILD_NUMBER} \
            --namespace staging
          kubectl rollout status deployment/myapp --namespace staging
        '''
      }
    }

    stage('Smoke Test') {
      steps {
        sh '''
          sleep 10
          curl -sf http://staging.example.com/health || exit 1
          curl -sf http://staging.example.com/api/status || exit 1
        '''
      }
    }

    stage('Deploy to Production') {
      input {
        message "Deploy to production?"
        ok "Deploy!"
      }
      steps {
        sh '''
          kubectl set image deployment/myapp \
            myapp=${IMAGE}:${BUILD_NUMBER} \
            --namespace production
          kubectl rollout status deployment/myapp --namespace production
        '''
      }
    }
  }

  post {
    success {
      echo 'Pipeline completed successfully!'
    }
    failure {
      echo 'Pipeline failed. Check logs above.'
    }
    always {
      cleanWs()
    }
  }
}

Declarative vs Scripted Pipeline

Feature	Declarative	Scripted
Syntax	Structured (stages, steps)	Groovy DSL (full flexibility)
Complexity	Simpler, opinionated	Full Groovy power
Shared libraries	Limited	Full access
Error handling	post { } blocks	try/catch/finally
When conditions	when { } directive	if/else Groovy
Parameters	parameters { } block	properties([...])
Best for	Most projects	Complex logic, dynamic pipelines

Essential Jenkins Plugins

Plugin	Purpose
Pipeline	Declarative & scripted pipelines
Git	Git SCM integration
GitHub Integration	GitHub org/repos, webhooks
Docker Pipeline	Build/push images in pipeline
Blue Ocean	Modern pipeline visualization
Credentials Binding	Use credentials securely
Configuration as Code	Jenkins config in YAML (JCasC)
Kubernetes	Dynamic agent pods on K8s
OWASP Dependency-Check	Dependency vulnerability scanning
SonarQube	Code quality & coverage reporting

Jenkinsfile.multi-branch

// ── Scripted Pipeline (more flexible) ──
node('docker') {
  try {
    stage('Build') {
      checkout scm
      def image = docker.build("myapp:${env.BUILD_ID}")
    }

    stage('Test Matrix') {
      def browsers = ['chrome', 'firefox']
      def parallelTests = [:]

      browsers.each { browser ->
        parallelTests["${browser}"] = {
          stage("Test on ${browser}") {
            sh "npm run test:e2e -- --browser=${browser}"
          }
        }
      }
      parallel parallelTests
    }

    stage('Deploy') {
      if (env.BRANCH_NAME == 'main') {
        sh 'kubectl apply -f k8s/production/'
      } else if (env.BRANCH_NAME == 'develop') {
        sh 'kubectl apply -f k8s/staging/'
      }
    }
  } catch (err) {
    currentBuild.result = 'FAILURE'
    emailext subject: "FAILED: ${env.JOB_NAME} #${env.BUILD_NUMBER}",
      body: "Error: ${err}\n\nURL: ${env.BUILD_URL}",
      to: 'oncall@example.com'
  } finally {
    junit '**/test-results/*.xml'
    archiveArtifacts artifacts: 'dist/**', fingerprint: true
  }
}

💡

Use Jenkins Configuration as Code (JCasC) to define Jenkins setup declaratively in jenkins.yaml. This makes Jenkins itself reproducible and version-controlled — just like your application code.

🦊

GitLab CI

PLATFORM

.gitlab-ci.yml

# ── GitLab CI/CD Full Pipeline ──
image: node:20-alpine

default:
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull-push

stages:
  - lint
  - test
  - build
  - security
  - deploy-staging
  - deploy-production

variables:
  DOCKER_REGISTRY: registry.gitlab.com
  APP_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

# ── Stage: Lint ──
lint:
  stage: lint
  script:
    - npm ci
    - npm run lint
    - npm run typecheck
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

# ── Stage: Test (with parallel matrix) ──
test:
  stage: test
  services:
    - postgres:16-alpine
  variables:
    POSTGRES_DB: testdb
    POSTGRES_USER: test
    POSTGRES_PASSWORD: test
  script:
    - npm ci
    - npm test -- --coverage --ci
  coverage: '/All files[^|]*\|[^|]*\s+([\d.]+)/'
  artifacts:
    when: always
    paths:
      - coverage/
    reports:
      junit: reports/test-results.xml
  parallel:
    matrix:
      - NODE_VERSION: [18, 20, 22]
        DB: [postgres, mysql]

# ── Stage: Build Docker Image ──
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login
        --username "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
  script:
    - docker build --cache-from $CI_REGISTRY_IMAGE:latest
        -t $APP_IMAGE -t $CI_REGISTRY_IMAGE:latest .
    - docker push $APP_IMAGE
    - docker push $CI_REGISTRY_IMAGE:latest
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
    - if: $CI_COMMIT_TAG

# ── Stage: Security Scan ──
dependency_scanning:
  stage: security
  image: node:20-alpine
  script:
    - npm audit --audit-level=moderate --json > audit-report.json
  artifacts:
    reports:
      dependency_scanning: audit-report.json
  allow_failure: false

container_scanning:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy image --exit-code 1 --severity CRITICAL,HIGH $APP_IMAGE
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

sast:
  stage: security
  script:
    - echo "Running SAST analysis..."
  artifacts:
    reports:
      sast: gl-sast-report.json

# ── Stage: Deploy to Staging ──
deploy:staging:
  stage: deploy-staging
  environment:
    name: staging
    url: https://staging.example.com
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context staging
    - kubectl set image deployment/myapp
        myapp=$APP_IMAGE -n staging
    - kubectl rollout status deployment/myapp -n staging
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

# ── Stage: Deploy to Production ──
deploy:production:
  stage: deploy-production
  environment:
    name: production
    url: https://example.com
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context production
    - kubectl set image deployment/myapp
        myapp=$APP_IMAGE -n production
    - kubectl rollout status deployment/myapp -n production
  when: manual
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

GitLab CI Keywords

Keyword	Description
stages	Ordered list of pipeline stages
image	Docker image for the job
services	Linked service containers (DB, Redis)
script	Shell commands to run
before_script	Run before every job
after_script	Run after every job (even on failure)
artifacts	Files to pass between jobs / keep
cache	Reuse files between pipeline runs
rules	Conditional job execution (replaces only/except)
trigger	Trigger a downstream pipeline
inherit	Control default variable inheritance
parallel	Run job multiple times (matrix)

Predefined Variables

Variable	Value
$CI_COMMIT_SHA	Full commit hash
$CI_COMMIT_BRANCH	Branch name
$CI_COMMIT_TAG	Tag name (if tagged)
$CI_PIPELINE_ID	Pipeline unique ID
$CI_JOB_ID	Job unique ID
$CI_REGISTRY_IMAGE	Image path (registry/project)
$CI_DEFAULT_BRANCH	Default branch (main/master)
$CI_PROJECT_DIR	Full clone path
$CI_RUNNER_TAGS	Runner tags
$CI_ENVIRONMENT_NAME	Environment name

⚠️

GitLab CI caching can reduce build times by 50%+. Use cache.key with package-lock.json or pom.xml as the key file. Set policy: pull for jobs that only read the cache.

🌿

Branching Strategies

WORKFLOW

GitFlow

Branch	Purpose	Lifetime
main / master	Production-ready code	Permanent
develop	Integration branch	Permanent
feature/*	New features	Temp (merge → develop)
release/*	Release prep	Temp (merge → main + develop)
hotfix/*	Production fixes	Temp (merge → main + develop)
support/*	Old version patches	Long-lived

Trunk-Based Development

Practice	Description
Single branch	All developers commit to main/trunk
Feature flags	Toggle incomplete features without branching
Short-lived branches	If used, live <1 day
Small batches	Small, frequent PRs (<400 LOC)
CI on every commit	All commits are buildable & testable
Dark launches	Ship to subset of users behind flag
AB testing	Gradual rollout with metrics

branching-comparison

# ── GitFlow Branch Operations ──
# Start a feature
git checkout develop
git pull origin develop
git checkout -b feature/user-authentication

# Work, commit, push
git add . && git commit -m "feat: add JWT auth middleware"
git push -u origin feature/user-authentication

# Create PR/MR: feature → develop
# After review and approval, squash merge into develop

# ── Release Branch ──
git checkout develop
git checkout -b release/1.2.0
# Bump version, fix last bugs, update changelog
git checkout main
git merge --no-ff release/1.2.0
git tag -a v1.2.0 -m "Release 1.2.0"
git checkout develop
git merge --no-ff release/1.2.0

# ── Hotfix Branch ──
git checkout main
git checkout -b hotfix/security-patch
# Fix the issue
git checkout main
git merge --no-ff hotfix/security-patch
git tag -a v1.2.1 -m "Hotfix 1.2.1"
git checkout develop
git merge --no-ff hotfix/security-patch

# ── Trunk-Based: Feature Flag Example ──
# app.js
const NEW_DASHBOARD = process.env.FEATURE_DASHBOARD === 'true';

function renderDashboard() {
  if (NEW_DASHBOARD) {
    return <NewDashboard />;  // New code on main
  }
  return <LegacyDashboard />;  // Existing code
}

Strategy Comparison

Aspect	GitFlow	Trunk-Based	GitHub Flow
Complexity	High (many branches)	Low	Low
Release cadence	Scheduled releases	Continuous	Continuous
CI requirement	Moderate	Very strict	Strict
Feature flags	Optional	Required	Optional
Team size	Large teams	Small–medium	Any
Merge strategy	Merge / no-ff	Squash / rebase	Merge / squash
Best for	Enterprise, versioned products	SaaS, microservices	Open source, startups
Downsides	Merge conflicts, slow feedback	Requires feature flags	No explicit release mgmt

GitHub Flow

Step 1Create a branch off main

Step 2Commit changes and push regularly

Step 3Open a Pull Request

Step 4Discuss, review, and iterate

Step 5CI/CD runs on every push

Step 6Squash merge to main

Step 7Deploy immediately from main

💡

GitHub Flow is the simplest strategy. It works well for teams practicing continuous deployment with automated testing. Use mainas the single source of truth — it's always deployable.

🚀

Deployment Strategies

RELEASE

Strategy Comparison

Strategy	Downtime	Rollback	Complexity	Risk
Rolling	None (gradual)	Slow (wait for cycle)	Low	Medium
Blue-Green	None (instant switch)	Instant (switch back)	Medium	Low
Canary	None (partial)	Fast (stop traffic)	High	Very Low
A/B Testing	None (split)	Fast (adjust weights)	High	Low
Shadow	None (mirror only)	N/A (read-only)	High	Low
Recreate	Full downtime	Redeploy old version	Very Low	High

Kubernetes Deployment Strategies

k8s-strategies

# ── Rolling Update (default) ──
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # +1 pod above desired
      maxUnavailable: 0  # no pod unavailable

# ── Blue-Green ──
# Use two Deployments + switch Service selector
# Deploy "green" → test → update Service selector

# ── Canary (with Istio / Nginx Ingress) ──
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
spec:
  http:
    - route:
        - destination:
            host: myapp
            subset: stable
          weight: 90
        - destination:
            host: myapp
            subset: canary
          weight: 10

deploy-canary.sh

#!/bin/bash
# ── Canary Deployment Script (Kubernetes) ──
set -euo pipefail

APP="myapp"
NAMESPACE="production"
NEW_IMAGE="registry.example.com/myapp:v2.0.0"
CANARY_STEPS=(5 10 25 50 100)  # Traffic % progression
WAIT_SECONDS=60

# Step 1: Deploy canary (new version, 0% traffic)
kubectl set image deployment/$APP-canary \
  $APP=$NEW_IMAGE -n $NAMESPACE

echo "✅ Canary deployed. Starting traffic shift..."

for STEP in "${CANARY_STEPS[@]}"; do
  echo "🟡 Shifting $STEP% traffic to canary..."

  # Update traffic weights (Istio VirtualService)
  kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: $APP
  namespace: $NAMESPACE
spec:
  hosts: [$APP]
  http:
    - route:
        - destination:
            host: $APP
            subset: stable
          weight: $((100 - STEP))
        - destination:
            host: $APP
            subset: canary
          weight: $STEP
EOF

  # Health check
  sleep $WAIT_SECONDS

  # Check error rate (Prometheus query)
  ERROR_RATE=$(curl -s \
    'http://prometheus:9090/api/v1/query?query=sum(rate(request_errors_total{deployment="canary"}[5m]))/sum(rate(request_total{deployment="canary"}[5m]))' \
    | jq '.data.result[0].value[1]' 2>/dev/null || echo "0")

  if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
    echo "🔴 Canary error rate too high: $ERROR_RATE. Rolling back!"
    # Rollback: set canary weight to 0
    kubectl set image deployment/$APP-canary \
      $APP=$CURRENT_IMAGE -n $NAMESPACE
    # Reset traffic
    kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: $APP
  namespace: $NAMESPACE
spec:
  hosts: [$APP]
  http:
    - route:
        - destination:
            host: $APP
            subset: stable
          weight: 100
        - destination:
            host: $APP
            subset: canary
          weight: 0
EOF
    exit 1
  fi

  echo "✅ Canary healthy at $STEP% traffic"
done

# Finalize: promote canary to stable
kubectl set image deployment/$APP-stable \
  $APP=$NEW_IMAGE -n $NAMESPACE
kubectl scale deployment/$APP-canary --replicas=0 -n $NAMESPACE
echo "🟢 Canary promoted to stable!"

blue-green.sh

#!/bin/bash
# ── Blue-Green Deployment (Kubernetes) ──
set -euo pipefail

APP="myapp"
NAMESPACE="production"
NEW_IMAGE="registry.example.com/myapp:v2.0.0"
CURRENT_COLOR=$(kubectl get svc $APP -n $NAMESPACE \
  -o jsonpath='{.spec.selector.color}')

# Determine next color
NEXT_COLOR=$([ "$CURRENT_COLOR" = "blue" ] && echo "green" || echo "blue")
echo "Current: $CURRENT_COLOR → Deploying: $NEXT_COLOR"

# Deploy new version
kubectl set image deployment/$APP-$NEXT_COLOR \
  $APP=$NEW_IMAGE -n $NAMESPACE

# Wait for rollout
kubectl rollout status deployment/$APP-$NEXT_COLOR -n $NAMESPACE --timeout=300s

# Run smoke tests against new deployment
echo "🧪 Running smoke tests..."
curl -sf https://$APP-$NEXT_COLOR.internal.example.com/health || {
  echo "❌ Health check failed! Aborting switch."
  exit 1
}

# Switch traffic (update Service selector)
kubectl patch svc $APP -n $NAMESPACE -p \
  '{"spec":{"selector":{"color":"'$NEXT_COLOR'"}}}'

echo "🟢 Traffic switched to $NEXT_COLOR"
echo "  Blue-Green: $CURRENT_COLOR is now idle (rollback target)"
echo "  To rollback: kubectl patch svc $APP -p '{"spec":{"selector":{"color":"'$CURRENT_COLOR'"}}}'"

💡

Canary + feature flags is the gold standard for safe deployments. Deploy the new version to all instances but only enable the feature for a small percentage of users. If metrics look bad, flip the flag off instantly — no redeployment needed.

📊

Monitoring & Health Checks

OBSERVABILITY

health-endpoints

// ── Express.js Health Check Endpoints ──
import express from 'express';
import { MongoClient } from 'mongodb';
import Redis from 'ioredis';

const app = express();

// Liveness: Is the process running?
app.get('/health/live', (_req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
});

// Readiness: Can it accept traffic?
app.get('/health/ready', async (_req, res) => {
  const checks: Record<string, 'ok' | 'degraded' | 'down'> = {};

  try {
    await db.command({ ping: 1 });
    checks.database = 'ok';
  } catch {
    checks.database = 'down';
  }

  try {
    await redis.ping();
    checks.redis = 'ok';
  } catch {
    checks.redis = 'degraded';
  }

  const isReady = Object.values(checks).every(s => s === 'ok');
  res.status(isReady ? 200 : 503).json({
    status: isReady ? 'ok' : 'not ready',
    checks,
    timestamp: new Date().toISOString(),
  });
});

// Startup: Has it finished initialization?
app.get('/health/started', (_req, res) => {
  res.json({ status: isInitialized ? 'ok' : 'initializing' });
});

k8s-probes

# ── Kubernetes Health Probes ──
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:latest
          ports:
            - containerPort: 3000

          # ── Liveness Probe (restart if failing) ──
          livenessProbe:
            httpGet:
              path: /health/live
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
            timeoutSeconds: 5
            failureThreshold: 3    # 3 failures → restart
            successThreshold: 1

          # ── Readiness Probe (remove from service if failing) ──
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 3
            successThreshold: 1

          # ── Startup Probe (for slow-starting apps) ──
          startupProbe:
            httpGet:
              path: /health/started
              port: 3000
            initialDelaySeconds: 0
            periodSeconds: 5
            failureThreshold: 60   # 60 * 5s = 5 min max startup

The Three Probes

Probe	Purpose	On Failure
Liveness	Is the app alive?	Kills and restarts the container
Readiness	Can it serve traffic?	Removes from Service endpoints
Startup	Is it initialized?	Disables other probes until success

Observability Pillars

Pillar	Question	Tools
Metrics	How many? How fast?	Prometheus, Datadog, CloudWatch
Logs	What happened?	ELK, Loki, CloudWatch Logs
Traces	Where did time go?	Jaeger, Zipkin, AWS X-Ray

💡

Golden Signals: Latency, Traffic, Errors, Saturation. Track these four and you have a solid monitoring foundation.

🔒

Security in CI/CD

DEVSECOPS

SAST vs DAST vs SCA

Type	When	What	Tools
SAST	Build time	Static source code analysis	SonarQube, CodeQL, Semgrep
DAST	Runtime	Dynamic app testing (running)	OWASP ZAP, Burp Suite
SCA	Build time	Dependency vulnerability scan	Snyk, Trivy, Dependabot
IAST	Runtime	Instrumented app testing	Contrast, Seeker
Secrets Scan	Pre-commit	Find leaked secrets in code	GitLeaks, TruffleHog
Container Scan	Build time	Image vulnerability scan	Trivy, Grype, Anchore
Fuzzing	CI	Random input testing	AFL, libFuzzer, Jazzer

Security Best Practices

Least privilegeCI runners get minimal permissions

Secrets managementVault, AWS SM, never in code/env files

Signed imagesCosign/Notation for container trust

SBOMSoftware Bill of Materials for every build

Immutable artifactsBuild once, deploy everywhere

Pipeline-as-codeVersion control your CI/CD config

Ephemeral runnersClean VM/container per job

Supply chainPin dependency versions, verify hashes

.github/workflows/security.yml

# ── Security Scanning Pipeline ──
name: DevSecOps Pipeline
on:
  push:
    branches: [main]
  pull_request:

jobs:
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for secret scan
      - name: Gitleaks
        uses: gitleaks/gitleaks-action@v2
        env:
          GITLEAKS_LICENSE: ${{ secrets.GITLEAKS_LICENSE }}

  sast-semgrep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/owasp-top-ten
            p/jwt
            p/nodejs
            p/secrets

  dependency-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          cache: 'npm'
      - run: npm ci
      - name: npm audit
        run: npm audit --audit-level=high --json > audit.json || true
      - name: Snyk test
        uses: snyk/actions/node@master
        with:
          args: --severity-threshold=high

  container-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t myapp:latest .
      - name: Trivy scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:latest'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

  sbom-generation:
    runs-on: ubuntu-latest
    needs: [dependency-scan, container-scan]
    steps:
      - uses: actions/checkout@v4
      - name: Generate SBOM
        run: |
          npm install -g @cyclonedx/cyclonedx-npm
          cyclonedx-npm --output-format json --output-file sbom.json
      - uses: actions/upload-artifact@v4
        with:
          name: sbom
          path: sbom.json

.pre-commit-config.yaml

# ── Pre-commit Hooks for Security ──
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: check-yaml
      - id: check-json
      - id: end-of-file-fixer
      - id: trailing-whitespace
      - id: no-commit-to-branch
        args: ['--branch', 'main']

  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.86.0
    hooks:
      - id: terraform_checkov
      - id: terraform_tfsec

🚫

Shift left on security. The cheapest bug to fix is the one you never write. Run SAST and secret scanning on every commit (pre-commit hooks). Block merges on critical vulnerabilities. Generate SBOMs for compliance. Scan containers before they ever reach a registry.

⏳

Loading cheatsheet...