There is a gap between “I understand CI/CD in theory” and “I have built one and watched it work.” This post closes that gap. We will build a complete, production-patterned GitOps pipeline from scratch — running entirely on your local machine, at zero cost — and walk through every stage from raising a pull request to deploying to production.
The two repos for this tutorial are:
service-demo— the application code, Helm chart, and GitHub Actions workflowsgitops-demo— the GitOps source of truth: ArgoCD configuration, environment values, and infrastructure bootstrap
What We Are Building
Developer pushes feature branch
→ PR opened → CI: lint, SCA, unit tests, integration tests
→ Image built → pushed to GHCR
→ Ephemeral environment pr-{N} deployed by ArgoCD (dev cluster)
→ Smoke tests run against ephemeral env
→ CODEOWNER approves → PR merged
→ CD: image built from main → main-{sha}
→ ArgoCD syncs preprod namespace (preprod cluster)
→ Full test suite runs against preprod
→ Check run posted to merge commit (release gate)
→ [Manual] Release workflow triggered
→ Preflight: HEAD commit must have passing preprod gate
→ Image retagged: main-{sha} → v{X.Y.Z} (no rebuild)
→ ArgoCD syncs prod namespace (prod cluster)
→ ArgoCD syncs dev namespace (dev cluster) — same tag, same bits
For this tutorial, all four environments run as Kubernetes namespaces on a single local kind cluster — simple to bootstrap and zero infrastructure cost. In production the topology is different: ephemeral environments and dev share a single dev cluster, while preprod and prod each get their own. More on this in Production Considerations.
GitOps principle: The CI pipeline never touches the cluster directly. It writes to git (image tags, values files). ArgoCD reads from git and reconciles the cluster. The cluster is always a reflection of what is in the repo — not what a script did six months ago.
Prerequisites
- Docker Desktop (or Docker Engine on Linux)
- Terraform >= 1.9
- kubectl
- Helm >= 3.14
- ArgoCD CLI
- A GitHub account
You do not need a cloud account. Everything runs locally.
Docker or Podman? The
Dockerfilein this tutorial uses standard OCI instructions and builds cleanly with Podman (podman build,podman push). If you are on Linux or prefer Podman for local builds, it works as a drop-in for image operations. However, kind uses Docker’s socket by default — Podman support for kind is experimental and requires additional configuration. For this tutorial, Docker is the path of least resistance. Commercial teams should note that Docker Desktop requires a paid subscription for organisations over 250 employees; Rancher Desktop provides a free alternative with both Docker and containerd support.
Part 1: Bootstrap the Infrastructure
Clone gitops-demo and use Terraform to spin up a kind cluster with ArgoCD pre-installed:
git clone https://github.com/teerakarna/gitops-demo
cd gitops-demo/terraform
terraform init
terraform apply
This does two things:
- Creates a kind cluster named
gitops-demowith port 80 mapped to your localhost - Installs ArgoCD into the
argocdnamespace via Helm with a NodePort service on port 30080
When it completes, verify ArgoCD is running:
kubectl get pods -n argocd
Then log in via the CLI:
# Get the initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
argocd login localhost:30080 --username admin --insecure
Tip: The ArgoCD UI is available at
http://localhost:30080. It is worth having it open while you work through this tutorial — watching applications sync in real time makes the GitOps model click.
Now apply the root ArgoCD Applications so ArgoCD starts managing the stable environments:
kubectl apply -f argocd/apps/
This creates three ArgoCD Applications — service-demo-dev, service-demo-preprod, and service-demo-prod — each pointing at the corresponding values file in gitops-demo. ArgoCD will attempt to sync them; they will show as OutOfSync until an image tag is in place.
The GitHub Actions Runner
The CI jobs that lint, test, and build the image run on GitHub-hosted runners. The jobs that deploy and smoke-test against the cluster need access to your local kind cluster, which GitHub-hosted runners cannot reach.
Note: This is not a limitation of the tutorial — it is a real architectural constraint. In production you solve it with cloud-hosted clusters (GKE, EKS) or the Actions Runner Controller (ARC), which runs ephemeral self-hosted runners inside Kubernetes itself. For this tutorial, we register a local self-hosted runner.
Register a self-hosted runner in the service-demo repo under Settings → Actions → Runners → New self-hosted runner, then on your machine:
# Download and configure the runner (follow the GitHub instructions)
# Then start it with the label 'kind'
./run.sh --labels kind
The CI workflow uses runs-on: [self-hosted, kind] only for the deploy and smoke-test jobs. Everything else uses ubuntu-latest.
Part 2: The Application
service-demo is a simple Items CRUD API written in Go with Gin.
GET /healthz liveness probe
GET /readyz readiness probe
GET /api/v1/items list all items
POST /api/v1/items create an item
GET /api/v1/items/:id get one item
PUT /api/v1/items/:id update an item
DELETE /api/v1/items/:id delete an item
The store is in-memory — no database, no external dependencies. The entire service is a single static Go binary running in a distroless container.
Why distroless? The
gcr.io/distroless/static:nonrootbase image contains no shell, no package manager, and no OS utilities. An attacker who achieves code execution inside the container has almost nothing to work with. The final image is ~12 MB. This is the right default for production-bound services — start here, not fromubuntu:latest.
The Test Suite
The pipeline runs three levels of tests:
| Type | Tool | What it tests |
|---|---|---|
| Unit | go test + testify | Individual handlers with in-process mocks |
| Integration | go test + httptest.Server | Full HTTP stack end-to-end, in-process |
| Smoke | curl against deployed env | Liveness and readiness endpoints |
The distinction matters. Unit tests are fast and catch logic bugs. Integration tests catch routing and middleware issues that unit tests miss. Smoke tests catch deployment problems — wrong image tag, broken config, failed pod start.
The Helm Chart
The chart lives at chart/service-demo/ inside the service-demo repo. All environment-specific values live in gitops-demo/values/{env}/service-demo.yaml.
# values/preprod/service-demo.yaml
image:
tag: main-abc1234 # written by CD pipeline
env:
ENV_NAME: preprod
replicaCount: 2
resources:
limits:
cpu: 250m
memory: 128Mi
# values/prod/service-demo.yaml
image:
tag: v1.0.0 # written by release pipeline
env:
ENV_NAME: prod
replicaCount: 3
resources:
limits:
cpu: 500m
memory: 256Mi
Parametrisation: This is the core pattern. One chart, four environments, every difference expressed as a values file. The chart itself never needs to know which environment it is deploying to. This makes it impossible for environment-specific hacks to creep into the chart — they have nowhere to live except the values file where they belong.
Part 3: The CI Pipeline (Pull Request)
When a PR is opened or updated against main, ci.yml runs five jobs in dependency order:
lint-sca ──┐
├─→ build ──→ deploy-ephemeral ──→ smoke-test
test ───┘
Lint and SCA
- name: govulncheck
run: |
go install golang.org/x/vuln/cmd/govulncheck@latest
govulncheck ./...
- name: golangci-lint
uses: golangci/golangci-lint-action@v6
govulncheck scans your Go module dependencies against the Go vulnerability database. golangci-lint runs a suite of static analysis checks. Both fail fast and report clearly.
Note on SCA: Software composition analysis — scanning dependencies for known CVEs — is the easiest security win in a pipeline.
govulncheckis purpose-built for Go, low-noise, and free. There is no good reason not to run it on every PR.
Build and Image Scan
- name: Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=raw,value=pr-${{ github.event.pull_request.number }}-${{ github.sha }}
- name: Build and push
uses: docker/build-push-action@v6
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Trivy image scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ghcr.io/${{ github.repository }}:pr-${{ github.event.pull_request.number }}-${{ github.sha }}
exit-code: '1'
severity: CRITICAL,HIGH
The image is tagged pr-{number}-{sha}. Trivy scans the built image for OS and language vulnerabilities and fails the build on CRITICAL or HIGH findings.
Why
pr-{number}-{sha}and not justpr-{number}? Two pushes to the same PR produce two different commits. Using only the PR number would mean every push overwrites the previous image tag. Using the SHA as well means every push produces a unique, immutable tag. You can always trace exactly which commit produced which image.
Ephemeral Environment
- name: Write PR values to gitops-demo
run: |
cat > values/pr/pr-${{ github.event.pull_request.number }}.yaml <<EOF
image:
tag: pr-${{ github.event.pull_request.number }}-${{ github.sha }}
env:
ENV_NAME: pr-${{ github.event.pull_request.number }}
EOF
git add values/pr/
git commit -m "ci: deploy pr-${{ github.event.pull_request.number }}"
git push
The ArgoCD ApplicationSet in argocd/appsets/ephemeral.yaml uses the pullRequest generator. It polls the service-demo repo for open PRs and automatically creates an ArgoCD Application per PR, deploying into namespace pr-{N} with CreateNamespace=true. When the PR is closed, the Application — and its namespace — are pruned.
Why not just
helm upgradefrom CI? Because then your cluster state is only known to whoever ran the lasthelm upgrade. ArgoCD maintains a continuous reconciliation loop — if someone manually edits a resource, ArgoCD drifts it back. The desired state is always what is in git, and the history of every change is a git commit.
Part 4: The CD Pipeline (Merge to Main)
When a PR is merged to main, cd.yml runs:
- name: Build image (main-{sha})
# tags: main-${{ github.sha }}
- name: Update preprod values
run: |
yq -i '.image.tag = "main-${{ github.sha }}"' \
values/preprod/service-demo.yaml
git commit -am "cd: promote main-${{ github.sha }} to preprod"
git push
# ArgoCD auto-syncs preprod namespace
- name: Wait for preprod sync
run: argocd app wait service-demo-preprod --sync --timeout 120
- name: Run full test suite against preprod
run: go test ./... -tags integration
# Posts a check run to the merge commit SHA
dev is not promoted here. It tracks releases — the same v{X.Y.Z} tag as production — promoted atomically at release time. Main snapshots go to preprod only. This is covered in Part 5.
If the preprod tests fail, a failing check run is posted against the merge commit. The release workflow checks for this before allowing promotion to production.
The gate that matters: Requiring a PR approval before merge is a human gate — it catches logic, design, and intent problems. The automated preprod gate catches runtime problems that only appear against a deployed environment. Both are necessary; neither replaces the other.
Part 5: Release to Production
The release workflow is triggered manually via workflow_dispatch:
on:
workflow_dispatch:
inputs:
version:
description: 'Semver version (e.g. 1.2.3)'
required: true
type: string
It runs a preflight check to confirm that the HEAD commit on main has a passing CD / Preprod Tests check. If it does not, the release is blocked — no exceptions.
- name: Preflight — verify preprod gate
run: |
STATUS=$(gh api repos/${{ github.repository }}/commits/main/check-runs \
--jq '[.check_runs[] | select(.name=="CD / Preprod Tests")] | first | .conclusion')
if [ "$STATUS" != "success" ]; then
echo "Preprod tests are not passing. Release blocked."
exit 1
fi
If the preflight passes, the image is retagged (no rebuild — the same bits that passed preprod go to prod):
- name: Retag image
run: |
SHA=$(git rev-parse HEAD)
crane tag ghcr.io/${{ github.repository }}:main-${SHA} v${{ inputs.version }}
- name: Promote to prod and dev
run: |
# Both values files updated in a single atomic commit.
# ArgoCD detects the change and syncs both namespaces.
yq -i '.image.tag = "v${{ inputs.version }}"' values/prod/service-demo.yaml
yq -i '.image.tag = "v${{ inputs.version }}"' values/dev/service-demo.yaml
git commit -am "release: v${{ inputs.version }}"
git push
- name: Create GitHub Release
run: gh release create v${{ inputs.version }} --generate-notes
prod and dev are promoted in a single atomic commit. Both namespaces receive the same v{X.Y.Z} tag simultaneously — they are always on the same release. Since dev has manual sync in ArgoCD (same as prod), the update is visible as a diff in the ArgoCD UI before it is applied.
No rebuild on release. The image in production is byte-for-byte identical to the image that ran against preprod. This is a critical property. If you rebuild from the same git tag, you get a different image (different build timestamp, potentially different base image layer). The retag pattern eliminates that class of “it worked in preprod” mystery entirely.
Try It: End-to-End Walkthrough
Fork both repos (
service-demoandgitops-demo) to your GitHub account and update the ArgoCD Application repoURLs accordingly.Make a change — open
internal/api/handler.goand add a new endpoint:r.GET("/api/v1/version", func(c *gin.Context) { c.JSON(http.StatusOK, gin.H{"version": os.Getenv("APP_VERSION")}) })Push a feature branch and open a PR. Watch the CI jobs run. The ephemeral namespace
pr-1(or whichever number) should appear in the cluster within ~60 seconds of ArgoCD picking up the values commit:kubectl get ns | grep pr- kubectl get pods -n pr-1Port-forward to the ephemeral env and verify your new endpoint:
kubectl port-forward -n pr-1 svc/service-demo 8080:80 curl http://localhost:8080/api/v1/versionApprove and merge the PR. Watch
cd.ymltrigger, thepreprodvalues file update in gitops-demo, and ArgoCD sync preprod.Run the release workflow with version
1.0.0. Check the prod namespace:kubectl get pods -n prod kubectl port-forward -n prod svc/service-demo 8081:80 curl http://localhost:8081/api/v1/version
Break It: Experiencing the Dev Loop
The pipeline is only useful if it fails loudly on bad code. Let’s prove it.
Break a unit test:
In internal/api/handler_test.go, change one assertion to expect the wrong status code:
// Change this:
assert.Equal(t, http.StatusCreated, w.Code)
// To this:
assert.Equal(t, http.StatusOK, w.Code)
Push to your feature branch. The test job fails within seconds. The PR is blocked from merging — the required status check never turns green. Fix it, push again, and watch the pipeline recover.
Break a smoke test:
Modify the health handler to return 500:
r.GET("/healthz", func(c *gin.Context) {
c.JSON(http.StatusInternalServerError, gin.H{"status": "broken"})
})
The unit tests pass (because the test expects the broken behaviour you wrote), but the smoke test against the deployed ephemeral env fails — the pod never becomes ready, and the readiness probe keeps failing. This is the category of bug that unit tests structurally cannot catch, and exactly why you need a deployed smoke test in the loop.
Key Concepts
The SHA in the image tag is not metadata decoration. It is the thread of trust that links a running container back to an exact commit, and forward to a release tag. pr-42-abc1234 → main-abc1234 → v2.1.0 are three names for the same set of bits. At any point you can git show abc1234 and know exactly what is running in any environment.
The pipeline never touches the cluster. It commits to git. ArgoCD reconciles the cluster from git. This means your cluster’s desired state has an audit log (git history), is diffable, is reviewable as a PR, and is self-healing. If someone kubectl edits a resource in preprod at 2am, ArgoCD silently reverts it within 3 minutes.
Ephemeral environments are not a nice-to-have. They catch the class of bug that only manifests in a real Kubernetes environment: broken health checks, missing environment variables, wrong service port, failed init containers. Running these checks against every PR, before merge, is the difference between catching them in dev and catching them in prod.
The promotion gate is binary. Preprod tests either pass or they do not. There is no “mostly passing” that justifies a release. This is intentional friction — the cost of maintaining a passing test suite is always lower than the cost of a broken prod deployment.
dev is production-parity, not a development sandbox. The dev namespace always runs the latest release — the same v{X.Y.Z} image as production, promoted atomically. This gives ephemeral PR environments something stable and production-equivalent to lean on: when an ephemeral env needs to call a downstream service, it resolves it at http://{service}.dev.svc.cluster.local via the SERVICES_NAMESPACE=dev environment variable. It gets the production-equivalent version, not a stale snapshot. The consequence of this design is also worth stating clearly: dev is not where you experiment with unreleased code. Preprod is. dev is a stable anchor that PR environments and integration tests can depend on.
Production Considerations
This tutorial is deliberately simplified for learning. Here is what changes when this pattern goes to production.
Cluster topology matters. In this tutorial all four environments share a single kind cluster. The intended production topology is deliberately different, and each split has a reason:
devand ephemeralpr-{N}namespaces share a dev cluster. This is intentional. Ephemeral environments reference shared services in thedevnamespace via in-cluster DNS. Colocation means low latency and no cross-cluster auth. They are designed to coexist.preprodgets its own cluster. Load and performance tests run against preprod. You do not want that traffic competing with dev cluster workloads, and you especially do not want it visible to prod nodes.prodgets its own cluster, in a separate account. The security boundary is non-negotiable: stricter IAM, no dev tooling, no path from CI directly into the cluster.
Namespaces share the Kubernetes control plane, network, and node pool. A runaway load test in preprod can starve prod of CPU. Cluster boundaries prevent that class of problem entirely.
Network policies. Without NetworkPolicy resources, every pod in the cluster can reach every other pod. Add default-deny policies and explicit allow rules per namespace before anything sensitive touches the cluster.
RBAC. The bootstrap in this tutorial grants broad permissions for convenience. In production every service account should have the minimum permissions it needs — no more. Use namespaced roles, not cluster roles, wherever possible.
Secrets management. Kubernetes Secrets are base64-encoded, not encrypted. In ETCD they are stored in plaintext by default. Use External Secrets Operator backed by AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault. Never commit a secret to git, even encrypted.
Image signing. Use cosign to sign images at build time and verify signatures in an admission controller (Kyverno or OPA Gatekeeper). An unsigned image should not be deployable to preprod or prod — full stop.
Self-hosted runners in production. Static self-hosted runners are a security liability — a compromised runner persists. Use the Actions Runner Controller to run ephemeral runners inside Kubernetes, authenticated via OIDC (not static tokens). The runner exists only for the duration of a single job.
ArgoCD multi-tenancy. A single ArgoCD instance can manage multiple teams’ applications. Use AppProjects to give each team a scoped view — their own source repos, destination namespaces, and permitted resources. Without AppProjects, every team can see and sync every application.
Observability. The pipeline described here has no metrics, no tracing, and no alerting. Before anything goes to production, add Prometheus + Grafana for metrics, structured logging with a log aggregator, and alerting on deployment failures and error rate spikes.
GitOps for the platform itself. This tutorial bootstraps ArgoCD via Terraform, then applies Application YAMLs manually. In production, ArgoCD should manage itself via the app-of-apps pattern: a root Application that manages all other Applications. No manual kubectl apply after the initial bootstrap.
What’s Next
The service-demo and gitops-demo repos are the complete working implementation of everything in this post. Clone them, fork them, break them.
A few natural extensions from here:
- Add a database — replace the in-memory store with PostgreSQL, managed via a Helm dependency. Introduces migration jobs, persistent volumes, and secrets to the pipeline.
- Canary deployments — use Argo Rollouts to progressively shift traffic to a new version, with automatic rollback on metric degradation.
- Multi-service — add a second service to the pipeline and see how the ApplicationSet and values structure scales without duplication.
- Policy gates — add Kyverno policies that prevent deployment of images without a Trivy clean scan or a cosign signature.