Three Lessons I've Learned Building Continuous Delivery Pipelines

May 15, 2023

The process of delivering software is not simple, and therefore automating the process via a Continuous Delivery (CD) pipeline - so you can turn up your velocity to 11 - has its pitfalls. Having been working with application and DevOps teams on CD pipelines for the best part of a decade, here are three lessons I’ve learned, from three of the projects I’ve worked on.

Keep It Simple

I was working on a project developing Spring Boot microservices for a cloud-hosted version of hoverfly - a tool for mocking responses to services. This was part of an early-stage startup so we built a simple pipeline to do continuous deployment.

We designed our pipeline to be similar to how the open-source hoverfly gets delivered, with CircleCI. We would merge to master, and based on that applications would be deployed to GKE based on a set of Kubernetes manifests. We had a good test suite that could find regressions in our services and block the deployment.

We were trying to find product-market fit, and this kind of simple pipeline did the job for that, allowing us to push out changes reliably and quickly. We didn’t spend a massive amount of effort on it and we got a lot of benefits from it.

When working for a big company, things have rarely (if ever) been this easy. Larger companies struggle much more with rapid delivery, in part because they have more risk - they have more customers using their services and even new services can rapidly attract new customers based on their existing relationship. So even services representing a completely new offering can damage the brand if they have poor reliability.

It is frustrating to hear such things as a developer, but such concerns about risk genuinely have some basis in reality - and are not solely a justification large companies use to justify the status quo. However, large companies also have more history, and in most cases, their processes become more complex over time. This happens because we’re not always focused on stopping accidental complexity getting into our processes, or removing steps that no longer add value.

So, even in a high-compliance, high-security, conservative enterprise environment, it can be worth asking, “What is the minimum we have to do to be able to deploy to production?” and approaching things like you are a startup, with complete freedom to define your process. It can be useful to get a good understanding of current processes as part of this, to inform how the software is currently proven to be safe and reliable before deployment. But those processes should always be questioned - in my experience, they can always be improved and streamlined.

Rely on your platform, and on established tools and patterns

This lesson comes from a larger, more enterprisey project - a large credit agency was transforming some of their apps to microservices and running them on Kubernetes. I’d done some initial work on a POC (proof-of-concept) that led to our work on the project, but I joined the main project some time after it had been kicked off.

The deployment process was managed by a spring boot microservice kicking off deployments on Kubernetes, based on a received JSON payload. This gave total flexibility in defining the deployment procedure, imposing pre-conditions, and so forth. However, it also meant that it was much more difficult to make changes to the app definitions than it would have been with the normal deployment process of rendering templates before sending them to the Kubernetes API. It also had some of the same problems as Helm’s Tiller used to - it needed its own authorization layer rather than being able to make use of Kubernetes’ RBAC.

The resulting complexity somewhat overshadowed the regular work of defining a CI and CD flow. We used Jenkins as a CI tool, with the pipeline implemented in a fairly conventional way, with different stages in the pipeline performing deployments to dev, test, etc. The program had not yet gotten to production by the time I rolled off - however, we did win some hard-fought battles to allow us to automatically deploy to staging/mock-production environments. I do believe that the program eventually rolled out to production successfully.

Since then, I’ve seen a few customers who also have written their own programs to orchestrate or organize deployments in some way. These systems do work, but in almost all cases have turned out to be a source of much suffering. The reasons for this are not obvious, but I have come to think that it is because the teams have written the software without first completely understanding (i) the process and needs of the delivery system, (ii) how those might change, or (iii) how much an additional service really adds value vs duplicating features of the deployment platform. Because of this those programs often don’t have the necessary flexibility in the right places, and frequent changes to them as apps get updated become burdensome.

It’s not necessarily a problem that the teams write ‘proper’ programs to solve their problems, but just that code also requires (ongoing) investment, and this problem space is complex - so it’s difficult to make the right design choices up front, and so, inevitably, lots of updates to the code will be needed.

This is why the default approach to CI/CD that I advocate is based on the use of well-established tools and patterns. We start by writing simple scripts to link those tools together to produce a working pipeline. We put the emphasis placed on capturing all the configuration that is needed to reproducibly deploy environments - and capturing it in version control. CI tools (which are, for the most part, just general-purpose workflow engines) play a crucial part in sequencing the various actions that are needed to deliver software. Once we have that, we may be able to identify specific areas where existing tools are lacking, or areas where a tool may be able to replace more complex and brittle scripts.

Capture desired state of the application for each environment you deploy to (use some form of GitOps)

On the project where I first came into contact with “GitOps”, our team was something like a combined ‘DevOps’ and platform team providing support to around a dozen development teams, each running a few “microservices”. The company was a startup B2B bank, so control over environments was important. The company was in the midst of a migration from docker swarm to Kubernetes (EKS) and so was running a couple of different deploy processes. Our team was responsible for the deployments, and so created a ‘GitOps’ repo which consisted of submodules, one for each microservice. This repo had a number of such branches, each representing one environment. Based on that snapshot, pipelines would run - first to update the submodules in the GitOps repo, then to deploy from the relevant branch to the relevant environment.

This was my first time using a GitOps repo, and I was very impressed by how powerful such a repo is to be able to manage the running of many applications, also giving the flexibility to change the promotion process as environments are gradually commissioned and decommissioned. Moreover, at any point in time it was simple to see what was running where. Comparing to the recent implementations I’ve seen of GitOps, the use of Git submodules and branches is a bit more fiddly and less intuitive.

However, the fact that we didn’t use a dedicated CD tool and/or the continuous sync model is still a valid choice today. ‘Configuration drift’ wasn’t a big problem for that particular client, in the absence of such a tool. GitOps is a pattern, not a toolkit, despite it being coined as a buzzword when Weaveworks started to evangelize their Flux project. Specific tools like Flux, Argo, and kapp-controller can help in some scenarios by syncing more often, and giving better visibility into the results of the deployment.

Regardless, in any scenario where you want greater visibility of your deployment and need flexibility to change your pipeline, consider this pattern - whereby deployments are driven by updating declarative, desired state information stored in Git.

This lesson, along with the previous, really points to the fact that we, as DevOps practitioners, should be aware of the broad principles of computing. In prioritising the layout of our data in a repo, we are focussing on data structure design, and drawing on the advice of venerated programmers:

“Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.” - Rob Pike

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” - Linus Torvalds

“Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowchart; it’ll be obvious.” – Fred Brooks, The Mythical Man Month (1975)

Final Thoughts

Pipelines are often approached in a practical, no-fuss sort of way. They could be seen only to be constituted of the necessary glue logic that fits in the gaps between our code over here, and the program that we want to be running over there. However, while they don’t have to be complicated, there are design choices - patterns and anti-patterns - that will make a big difference to their effectiveness. In this post, I’ve shared some of my reflections on the high-level design choices I’ve seen being made, but that’s barely scratched the surface of what there is to know about Continuous Delivery pipelines.

Interested readers might next want to check out my blog post on measuring pipeline performance, which describes how you can measure the success of your pipeline design choices.