There is a gap between “I understand CI/CD in theory” and “I have built one and watched it work.” This post closes that gap. We will build a complete, production-patterned GitOps pipeline from scratch — running entirely on your local machine, at zero cost — and walk through every stage from raising a pull request to deploying to production.

The two repos for this tutorial are:

  • service-demo — the application code, Helm chart, and GitHub Actions workflows
  • gitops-demo — the GitOps source of truth: ArgoCD configuration, environment values, and infrastructure bootstrap

What We Are Building

Developer pushes feature branch
  → PR opened → CI: lint, SCA, unit tests, integration tests
  → Image built → pushed to GHCR
  → Ephemeral environment pr-{N} deployed by ArgoCD (dev cluster)
  → Smoke tests run against ephemeral env
  → CODEOWNER approves → PR merged

  → CD: image built from main → main-{sha}
  → ArgoCD syncs preprod namespace (preprod cluster)
  → Full test suite runs against preprod
  → Check run posted to merge commit (release gate)

  → [Manual] Release workflow triggered
  → Preflight: HEAD commit must have passing preprod gate
  → Image retagged: main-{sha} → v{X.Y.Z} (no rebuild)
  → ArgoCD syncs prod namespace  (prod cluster)
  → ArgoCD syncs dev namespace   (dev cluster) — same tag, same bits

For this tutorial, all four environments run as Kubernetes namespaces on a single local kind cluster — simple to bootstrap and zero infrastructure cost. In production the topology is different: ephemeral environments and dev share a single dev cluster, while preprod and prod each get their own. More on this in Production Considerations.

GitOps principle: The CI pipeline never touches the cluster directly. It writes to git (image tags, values files). ArgoCD reads from git and reconciles the cluster. The cluster is always a reflection of what is in the repo — not what a script did six months ago.


Prerequisites

You do not need a cloud account. Everything runs locally.

Docker or Podman? The Dockerfile in this tutorial uses standard OCI instructions and builds cleanly with Podman (podman build, podman push). If you are on Linux or prefer Podman for local builds, it works as a drop-in for image operations. However, kind uses Docker’s socket by default — Podman support for kind is experimental and requires additional configuration. For this tutorial, Docker is the path of least resistance. Commercial teams should note that Docker Desktop requires a paid subscription for organisations over 250 employees; Rancher Desktop provides a free alternative with both Docker and containerd support.


Part 1: Bootstrap the Infrastructure

Clone gitops-demo and use Terraform to spin up a kind cluster with ArgoCD pre-installed:

git clone https://github.com/teerakarna/gitops-demo
cd gitops-demo/terraform

terraform init
terraform apply

This does two things:

  1. Creates a kind cluster named gitops-demo with port 80 mapped to your localhost
  2. Installs ArgoCD into the argocd namespace via Helm with a NodePort service on port 30080

When it completes, verify ArgoCD is running:

kubectl get pods -n argocd

Then log in via the CLI:

# Get the initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

argocd login localhost:30080 --username admin --insecure

Tip: The ArgoCD UI is available at http://localhost:30080. It is worth having it open while you work through this tutorial — watching applications sync in real time makes the GitOps model click.

Now apply the root ArgoCD Applications so ArgoCD starts managing the stable environments:

kubectl apply -f argocd/apps/

This creates three ArgoCD Applications — service-demo-dev, service-demo-preprod, and service-demo-prod — each pointing at the corresponding values file in gitops-demo. ArgoCD will attempt to sync them; they will show as OutOfSync until an image tag is in place.

The GitHub Actions Runner

The CI jobs that lint, test, and build the image run on GitHub-hosted runners. The jobs that deploy and smoke-test against the cluster need access to your local kind cluster, which GitHub-hosted runners cannot reach.

Note: This is not a limitation of the tutorial — it is a real architectural constraint. In production you solve it with cloud-hosted clusters (GKE, EKS) or the Actions Runner Controller (ARC), which runs ephemeral self-hosted runners inside Kubernetes itself. For this tutorial, we register a local self-hosted runner.

Register a self-hosted runner in the service-demo repo under Settings → Actions → Runners → New self-hosted runner, then on your machine:

# Download and configure the runner (follow the GitHub instructions)
# Then start it with the label 'kind'
./run.sh --labels kind

The CI workflow uses runs-on: [self-hosted, kind] only for the deploy and smoke-test jobs. Everything else uses ubuntu-latest.


Part 2: The Application

service-demo is a simple Items CRUD API written in Go with Gin.

GET    /healthz              liveness probe
GET    /readyz               readiness probe
GET    /api/v1/items         list all items
POST   /api/v1/items         create an item
GET    /api/v1/items/:id     get one item
PUT    /api/v1/items/:id     update an item
DELETE /api/v1/items/:id     delete an item

The store is in-memory — no database, no external dependencies. The entire service is a single static Go binary running in a distroless container.

Why distroless? The gcr.io/distroless/static:nonroot base image contains no shell, no package manager, and no OS utilities. An attacker who achieves code execution inside the container has almost nothing to work with. The final image is ~12 MB. This is the right default for production-bound services — start here, not from ubuntu:latest.

The Test Suite

The pipeline runs three levels of tests:

TypeToolWhat it tests
Unitgo test + testifyIndividual handlers with in-process mocks
Integrationgo test + httptest.ServerFull HTTP stack end-to-end, in-process
Smokecurl against deployed envLiveness and readiness endpoints

The distinction matters. Unit tests are fast and catch logic bugs. Integration tests catch routing and middleware issues that unit tests miss. Smoke tests catch deployment problems — wrong image tag, broken config, failed pod start.

The Helm Chart

The chart lives at chart/service-demo/ inside the service-demo repo. All environment-specific values live in gitops-demo/values/{env}/service-demo.yaml.

# values/preprod/service-demo.yaml
image:
  tag: main-abc1234   # written by CD pipeline

env:
  ENV_NAME: preprod

replicaCount: 2

resources:
  limits:
    cpu: 250m
    memory: 128Mi
# values/prod/service-demo.yaml
image:
  tag: v1.0.0         # written by release pipeline

env:
  ENV_NAME: prod

replicaCount: 3

resources:
  limits:
    cpu: 500m
    memory: 256Mi

Parametrisation: This is the core pattern. One chart, four environments, every difference expressed as a values file. The chart itself never needs to know which environment it is deploying to. This makes it impossible for environment-specific hacks to creep into the chart — they have nowhere to live except the values file where they belong.


Part 3: The CI Pipeline (Pull Request)

When a PR is opened or updated against main, ci.yml runs five jobs in dependency order:

lint-sca ──┐
           ├─→ build ──→ deploy-ephemeral ──→ smoke-test
test    ───┘

Lint and SCA

- name: govulncheck
  run: |
    go install golang.org/x/vuln/cmd/govulncheck@latest
    govulncheck ./...

- name: golangci-lint
  uses: golangci/golangci-lint-action@v6

govulncheck scans your Go module dependencies against the Go vulnerability database. golangci-lint runs a suite of static analysis checks. Both fail fast and report clearly.

Note on SCA: Software composition analysis — scanning dependencies for known CVEs — is the easiest security win in a pipeline. govulncheck is purpose-built for Go, low-noise, and free. There is no good reason not to run it on every PR.

Build and Image Scan

- name: Docker metadata
  id: meta
  uses: docker/metadata-action@v5
  with:
    images: ghcr.io/${{ github.repository }}
    tags: |
      type=raw,value=pr-${{ github.event.pull_request.number }}-${{ github.sha }}

- name: Build and push
  uses: docker/build-push-action@v6
  with:
    push: true
    tags: ${{ steps.meta.outputs.tags }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

- name: Trivy image scan
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ghcr.io/${{ github.repository }}:pr-${{ github.event.pull_request.number }}-${{ github.sha }}
    exit-code: '1'
    severity: CRITICAL,HIGH

The image is tagged pr-{number}-{sha}. Trivy scans the built image for OS and language vulnerabilities and fails the build on CRITICAL or HIGH findings.

Why pr-{number}-{sha} and not just pr-{number}? Two pushes to the same PR produce two different commits. Using only the PR number would mean every push overwrites the previous image tag. Using the SHA as well means every push produces a unique, immutable tag. You can always trace exactly which commit produced which image.

Ephemeral Environment

- name: Write PR values to gitops-demo
  run: |
    cat > values/pr/pr-${{ github.event.pull_request.number }}.yaml <<EOF
    image:
      tag: pr-${{ github.event.pull_request.number }}-${{ github.sha }}
    env:
      ENV_NAME: pr-${{ github.event.pull_request.number }}
    EOF
    git add values/pr/
    git commit -m "ci: deploy pr-${{ github.event.pull_request.number }}"
    git push

The ArgoCD ApplicationSet in argocd/appsets/ephemeral.yaml uses the pullRequest generator. It polls the service-demo repo for open PRs and automatically creates an ArgoCD Application per PR, deploying into namespace pr-{N} with CreateNamespace=true. When the PR is closed, the Application — and its namespace — are pruned.

Why not just helm upgrade from CI? Because then your cluster state is only known to whoever ran the last helm upgrade. ArgoCD maintains a continuous reconciliation loop — if someone manually edits a resource, ArgoCD drifts it back. The desired state is always what is in git, and the history of every change is a git commit.


Part 4: The CD Pipeline (Merge to Main)

When a PR is merged to main, cd.yml runs:

- name: Build image (main-{sha})
  # tags: main-${{ github.sha }}

- name: Update preprod values
  run: |
    yq -i '.image.tag = "main-${{ github.sha }}"' \
      values/preprod/service-demo.yaml
    git commit -am "cd: promote main-${{ github.sha }} to preprod"
    git push
    # ArgoCD auto-syncs preprod namespace

- name: Wait for preprod sync
  run: argocd app wait service-demo-preprod --sync --timeout 120

- name: Run full test suite against preprod
  run: go test ./... -tags integration
  # Posts a check run to the merge commit SHA

dev is not promoted here. It tracks releases — the same v{X.Y.Z} tag as production — promoted atomically at release time. Main snapshots go to preprod only. This is covered in Part 5.

If the preprod tests fail, a failing check run is posted against the merge commit. The release workflow checks for this before allowing promotion to production.

The gate that matters: Requiring a PR approval before merge is a human gate — it catches logic, design, and intent problems. The automated preprod gate catches runtime problems that only appear against a deployed environment. Both are necessary; neither replaces the other.


Part 5: Release to Production

The release workflow is triggered manually via workflow_dispatch:

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Semver version (e.g. 1.2.3)'
        required: true
        type: string

It runs a preflight check to confirm that the HEAD commit on main has a passing CD / Preprod Tests check. If it does not, the release is blocked — no exceptions.

- name: Preflight — verify preprod gate
  run: |
    STATUS=$(gh api repos/${{ github.repository }}/commits/main/check-runs \
      --jq '[.check_runs[] | select(.name=="CD / Preprod Tests")] | first | .conclusion')
    if [ "$STATUS" != "success" ]; then
      echo "Preprod tests are not passing. Release blocked."
      exit 1
    fi

If the preflight passes, the image is retagged (no rebuild — the same bits that passed preprod go to prod):

- name: Retag image
  run: |
    SHA=$(git rev-parse HEAD)
    crane tag ghcr.io/${{ github.repository }}:main-${SHA} v${{ inputs.version }}

- name: Promote to prod and dev
  run: |
    # Both values files updated in a single atomic commit.
    # ArgoCD detects the change and syncs both namespaces.
    yq -i '.image.tag = "v${{ inputs.version }}"' values/prod/service-demo.yaml
    yq -i '.image.tag = "v${{ inputs.version }}"' values/dev/service-demo.yaml
    git commit -am "release: v${{ inputs.version }}"
    git push

- name: Create GitHub Release
  run: gh release create v${{ inputs.version }} --generate-notes

prod and dev are promoted in a single atomic commit. Both namespaces receive the same v{X.Y.Z} tag simultaneously — they are always on the same release. Since dev has manual sync in ArgoCD (same as prod), the update is visible as a diff in the ArgoCD UI before it is applied.

No rebuild on release. The image in production is byte-for-byte identical to the image that ran against preprod. This is a critical property. If you rebuild from the same git tag, you get a different image (different build timestamp, potentially different base image layer). The retag pattern eliminates that class of “it worked in preprod” mystery entirely.


Try It: End-to-End Walkthrough

  1. Fork both repos (service-demo and gitops-demo) to your GitHub account and update the ArgoCD Application repoURLs accordingly.

  2. Make a change — open internal/api/handler.go and add a new endpoint:

    r.GET("/api/v1/version", func(c *gin.Context) {
        c.JSON(http.StatusOK, gin.H{"version": os.Getenv("APP_VERSION")})
    })
    
  3. Push a feature branch and open a PR. Watch the CI jobs run. The ephemeral namespace pr-1 (or whichever number) should appear in the cluster within ~60 seconds of ArgoCD picking up the values commit:

    kubectl get ns | grep pr-
    kubectl get pods -n pr-1
    
  4. Port-forward to the ephemeral env and verify your new endpoint:

    kubectl port-forward -n pr-1 svc/service-demo 8080:80
    curl http://localhost:8080/api/v1/version
    
  5. Approve and merge the PR. Watch cd.yml trigger, the preprod values file update in gitops-demo, and ArgoCD sync preprod.

  6. Run the release workflow with version 1.0.0. Check the prod namespace:

    kubectl get pods -n prod
    kubectl port-forward -n prod svc/service-demo 8081:80
    curl http://localhost:8081/api/v1/version
    

Break It: Experiencing the Dev Loop

The pipeline is only useful if it fails loudly on bad code. Let’s prove it.

Break a unit test:

In internal/api/handler_test.go, change one assertion to expect the wrong status code:

// Change this:
assert.Equal(t, http.StatusCreated, w.Code)
// To this:
assert.Equal(t, http.StatusOK, w.Code)

Push to your feature branch. The test job fails within seconds. The PR is blocked from merging — the required status check never turns green. Fix it, push again, and watch the pipeline recover.

Break a smoke test:

Modify the health handler to return 500:

r.GET("/healthz", func(c *gin.Context) {
    c.JSON(http.StatusInternalServerError, gin.H{"status": "broken"})
})

The unit tests pass (because the test expects the broken behaviour you wrote), but the smoke test against the deployed ephemeral env fails — the pod never becomes ready, and the readiness probe keeps failing. This is the category of bug that unit tests structurally cannot catch, and exactly why you need a deployed smoke test in the loop.


Key Concepts

The SHA in the image tag is not metadata decoration. It is the thread of trust that links a running container back to an exact commit, and forward to a release tag. pr-42-abc1234main-abc1234v2.1.0 are three names for the same set of bits. At any point you can git show abc1234 and know exactly what is running in any environment.

The pipeline never touches the cluster. It commits to git. ArgoCD reconciles the cluster from git. This means your cluster’s desired state has an audit log (git history), is diffable, is reviewable as a PR, and is self-healing. If someone kubectl edits a resource in preprod at 2am, ArgoCD silently reverts it within 3 minutes.

Ephemeral environments are not a nice-to-have. They catch the class of bug that only manifests in a real Kubernetes environment: broken health checks, missing environment variables, wrong service port, failed init containers. Running these checks against every PR, before merge, is the difference between catching them in dev and catching them in prod.

The promotion gate is binary. Preprod tests either pass or they do not. There is no “mostly passing” that justifies a release. This is intentional friction — the cost of maintaining a passing test suite is always lower than the cost of a broken prod deployment.

dev is production-parity, not a development sandbox. The dev namespace always runs the latest release — the same v{X.Y.Z} image as production, promoted atomically. This gives ephemeral PR environments something stable and production-equivalent to lean on: when an ephemeral env needs to call a downstream service, it resolves it at http://{service}.dev.svc.cluster.local via the SERVICES_NAMESPACE=dev environment variable. It gets the production-equivalent version, not a stale snapshot. The consequence of this design is also worth stating clearly: dev is not where you experiment with unreleased code. Preprod is. dev is a stable anchor that PR environments and integration tests can depend on.


Production Considerations

This tutorial is deliberately simplified for learning. Here is what changes when this pattern goes to production.

Cluster topology matters. In this tutorial all four environments share a single kind cluster. The intended production topology is deliberately different, and each split has a reason:

  • dev and ephemeral pr-{N} namespaces share a dev cluster. This is intentional. Ephemeral environments reference shared services in the dev namespace via in-cluster DNS. Colocation means low latency and no cross-cluster auth. They are designed to coexist.
  • preprod gets its own cluster. Load and performance tests run against preprod. You do not want that traffic competing with dev cluster workloads, and you especially do not want it visible to prod nodes.
  • prod gets its own cluster, in a separate account. The security boundary is non-negotiable: stricter IAM, no dev tooling, no path from CI directly into the cluster.

Namespaces share the Kubernetes control plane, network, and node pool. A runaway load test in preprod can starve prod of CPU. Cluster boundaries prevent that class of problem entirely.

Network policies. Without NetworkPolicy resources, every pod in the cluster can reach every other pod. Add default-deny policies and explicit allow rules per namespace before anything sensitive touches the cluster.

RBAC. The bootstrap in this tutorial grants broad permissions for convenience. In production every service account should have the minimum permissions it needs — no more. Use namespaced roles, not cluster roles, wherever possible.

Secrets management. Kubernetes Secrets are base64-encoded, not encrypted. In ETCD they are stored in plaintext by default. Use External Secrets Operator backed by AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault. Never commit a secret to git, even encrypted.

Image signing. Use cosign to sign images at build time and verify signatures in an admission controller (Kyverno or OPA Gatekeeper). An unsigned image should not be deployable to preprod or prod — full stop.

Self-hosted runners in production. Static self-hosted runners are a security liability — a compromised runner persists. Use the Actions Runner Controller to run ephemeral runners inside Kubernetes, authenticated via OIDC (not static tokens). The runner exists only for the duration of a single job.

ArgoCD multi-tenancy. A single ArgoCD instance can manage multiple teams’ applications. Use AppProjects to give each team a scoped view — their own source repos, destination namespaces, and permitted resources. Without AppProjects, every team can see and sync every application.

Observability. The pipeline described here has no metrics, no tracing, and no alerting. Before anything goes to production, add Prometheus + Grafana for metrics, structured logging with a log aggregator, and alerting on deployment failures and error rate spikes.

GitOps for the platform itself. This tutorial bootstraps ArgoCD via Terraform, then applies Application YAMLs manually. In production, ArgoCD should manage itself via the app-of-apps pattern: a root Application that manages all other Applications. No manual kubectl apply after the initial bootstrap.


What’s Next

The service-demo and gitops-demo repos are the complete working implementation of everything in this post. Clone them, fork them, break them.

A few natural extensions from here:

  • Add a database — replace the in-memory store with PostgreSQL, managed via a Helm dependency. Introduces migration jobs, persistent volumes, and secrets to the pipeline.
  • Canary deployments — use Argo Rollouts to progressively shift traffic to a new version, with automatic rollback on metric degradation.
  • Multi-service — add a second service to the pipeline and see how the ApplicationSet and values structure scales without duplication.
  • Policy gates — add Kyverno policies that prevent deployment of images without a Trivy clean scan or a cosign signature.