I've been using Claude Code for our internal GitLab pipelines for a while now and wanted to write up what the workflow actually looks like in practice. The service below is one of our APIs, and the pipeline is a fairly standard build-scan-push setup. Nothing exotic, but enough surface area to show where the tool earns its keep and where it doesn't.
Pipeline rewrites on our older services tend to start from roughly the same place: a Dockerfile that works on someone's laptop, a Jenkinsfile or half-broken .gitlab-ci.yml that nobody really trusts, and the perennial want to stop babysitting the thing through every run. For a container pipeline of this shape our tooling choices don't move around much anymore. Rootless BuildKit for the build (no daemon, no privileged runner, plays well with our fleet), Trivy for scanning, Skopeo when we want to retag without pulling the image back through a runner, and GitLab's rules: and needs: to keep the DAG honest.
The API has been around for a while. Its current build is a Jenkinsfile with a single stage: docker build then docker push, and that's the whole file. Nothing scans the image. Nothing caches layers. Nothing runs in parallel. We're moving the repo onto our self-managed GitLab instance and this is the moment to finally add the things we've been meaning to add for a year: scanning with a hard fail on HIGH and CRITICAL, proper build caching, and a retag step so :latest only ever points at something that has actually passed scan.
After cloning, I start Claude Code in the repo root:
The trick at this point is to not immediately ask it to generate anything. You want it to read the repo first. Pipelines written blind hit something awkward on the first real run, usually one of the small things that live in a repo for a reason nobody remembers. A .dockerignore that excludes vendor/ so the build context ends up without dependencies. A Makefile injecting flags that the CI build needs to match. A cache mount in the Dockerfile that the runner's default setup doesn't know what to do with. When you write the pipeline yourself, those details are invisible because they're already in your head. Claude Code isn't in your head, so it needs a minute to go look.
My first prompt is pretty loose:
It reads both files and comes back with something like this:
This runs and uses the right tools. There are a couple of things wrong with it though, neither of them disasters.
The first is that the push happens before the scan, so the image is already sitting in the registry under its real tag by the time Trivy gets around to looking at it. If the scan fails, you've got a vulnerable image reachable in the registry for the minute or so it takes for the pipeline to fail out and somebody to go clean up. The fix is a staging tag and a separate promote stage, which Claude Code is perfectly happy to write if you ask for it but didn't volunteer.
The second is no Trivy DB cache. Every pipeline pulls the full vulnerability database from scratch, which adds a minute or two to every run and sometimes runs into rate limits on the DB mirror at the worst possible time. Also easy to fix.
Neither of these is surprising. The first pass Claude Code gives me on a pipeline almost always looks like this: usable, but with a couple of obvious things to tighten up. The fixes are about shape, not design, which is honestly the situation you want to be in.
My follow-up:
What comes back:
A couple of things about this revision are worth noting.
The --import-cache and --export-cache flags weren't in my prompt. They tell BuildKit to push its intermediate cache layers up to a dedicated cache ref under the project namespace, and on this service it saves roughly ninety seconds on a warm build. Scoping the cache under the project is the convention I'd have picked anyway, because it gets cleaned up when the project does.
Claude Code also added needs: between the stages, turning what would have been a strict sequence into a proper DAG. I didn't ask for that either, but I'm glad it's there, because the minute you start playing with rules: on individual stages the DAG behavior starts to matter.
The scan stage got split into two passes, one that reports MEDIUM and below without failing the job and one that hard-fails on HIGH and CRITICAL. This is a pattern I like a lot, because it lets developers see what's coming at them without blocking their work today, and Claude Code reached for it on its own.
There is one thing I'd push back on. The promote job is set to when: manual on MR pipelines, which means any developer can manually promote an unscanned image to the real tag on their branch. On some of our services we actually want that as an integration-testing escape hatch, so that a colleague can pull a known-good image for cross-service testing. On others it's just a policy hole. It's a one-line change either way.
Here is the Jenkinsfile we're replacing:
My prompt:
The shape of what comes back is right. BUILD_NUMBER becomes CI_PIPELINE_IID, which is GitLab's closest equivalent. The Jenkins withCredentials block becomes a pair of masked CI/CD variables. The Nexus destination gets hoisted into a REGISTRY_HOST variable so tag construction stays readable.
Where I usually have to step in on these translations is at the spots where Jenkins and GitLab don't share assumptions. Jenkins agents are long-lived and almost always have a Docker daemon available. Our self-managed GitLab runners don't, which is the whole reason rootless BuildKit is in the pipeline to begin with. If you don't tell Claude Code this, you'll sometimes see it translate docker build into a docker:dind service block. That works on GitLab.com shared runners. It falls over on our fleet. Easiest fix is to put the runner constraint in a CLAUDE.md so you aren't repeating it every session.
Jenkins post blocks are the other one that catches people. post { failure { ... } } runs at pipeline scope and can read the outcome of any earlier stage. GitLab CI doesn't have anything that does exactly that. The right way to translate it is usually a dedicated notification job at the end of the DAG using when: on_failure. Claude Code will sometimes go a different route and put the notification in an after_script:, which only sees the status of its own job and misses everything upstream. Worth a second look on any Jenkins migration.
One of the engineers on the service had never written a GitLab pipeline before this one. Strong developer, fine with Docker, but CI config had always been somebody else's problem. The usual pattern on a team like ours is that one person writes the pipeline, everyone else copies bits of it into their own services, and nobody's totally sure later which parts actually mattered and which ones were just along for the ride.
With Claude Code she wrote the pipeline herself, pausing to ask what needs: actually does, why BuildKit wants both --import-cache and --export-cache when it feels like one should be enough, what happens if you don't pin the Trivy image. The file she shipped was basically what one of the platform engineers would have written, with the difference that she understood every line of it. Three weeks later, when the Trivy tag moved and the cache key needed a version bump, she just handled it, without paging the platform team.
Review hasn't gone away, but it's doing a different job now. Catching syntax mistakes and spotting obviously-wrong shapes happens faster because there aren't as many of them. What takes longer is the judgment stuff: cache key scope, how permissive the rules: should be, whether retag-on-promote actually matches what this particular service needs. That's the part Claude Code is worst at, because it turns on context that isn't in the repo.
The workflow I've settled into for a pipeline this size is: open Claude Code in the repo, let it read what's there, describe what I want loosely, then review the first pass for structural stuff (push-before-scan, missing caches, old syntax) and iterate from there rather than starting over. Pin image versions before I commit. Run the thing in a real branch rather than in my head. Once it works, move any runner constraints or team conventions into a CLAUDE.md so the next person on the service doesn't have to rediscover them from scratch.
For a service of this size, going from an empty .gitlab-ci.yml to something I'd merge takes me under an hour. A platform engineer writing it from scratch takes about the same. The difference isn't really speed. It's who ends up writing and owning the file. The service team does, instead of the platform team. And the small things that usually take a second or third pass to catch, like cache key scope or DAG structure or how the scan is split, tend to be there in the first draft.