Thinking · Data First DevOps

When you should not do DFD

Let's under this that DFD is nothing which is mandatory. It can be completely avoided. Maybe you decide to use it, yet it would make sense more and more, with the increasing size of the project. It does not mean it is meaningless for small projects, no it is not. If you have enough energy and time, do it whenever you can.

Sample scenario

For now, let's assume that your project is big enough, say 300 microservice/deployable and you have 5 environments, say Dev, QA, Performance, UAT and Production. Let's say you have 10+ scrum teams. This is what we will consider as not small and very good candidate for DFD.

I repeat, DFD can be implemented on any size of DevOps. But doing it on small size, might not give that good ROI. So, you can skip it in the beginning and implement it when you have enough time and resources.

Start thinking

Let me remind you - In DFD you see everything as data. So, keep this in mind because everything you need to see as data from now on.

First find out what are the items we need to work on:

Git repo creation is one thing. Just in case you think you can create one Git repo for more than one microservice, then you need to relook at the fact that microservices are independently releasable and deployable. Keeping such items in one repo is going against those fundamental concepts.
Creating a pipeline which can build the microservice and deploy it. So, consider build and deployment as another thing to look as data.
What about environment? Different environment would have different cluster. You might have different way for it, but for this experiment purpose let's say every environment has a different K8S cluster.

Here we are not considering Continuous Deployment, rather considering Continuous Delivery. Not that DFD is not applicable on Continuous Deployment. It's just for this study purpose we would consider Continuous Delivery. DFD can very well be extended to Continuous Deployment. Same data will be used, and same data will be generated. But there will be some interesting data usage will be there in case of Continuous Deployment. Let's continue with Continuous Delivery approach.

We have already listed few of the activities we will perform under DevOps process. Now let's see how to see it from Data First perspective. We will see what the data points for every element are.

Git Repo

Microservice name
Git URL
Repo type (you won't have only microservices in your project)
Who created
When it got created

Build

Who built
When
Which branch
Which commit id
Which build items are created
Build status
Build duration

Deployment

Who deployed
When
What got deployed
Which version
In which environment

Environment

Env name
K8S cluster credential id
K8S cluster name
Created by
Created On

Can you think of anything else ?

Approach

First you need to think about the data you want to or can capture about a property or process as input or output.

The Glitch

If you get tempted to duplicate the data which is stored in different systems like Git, SonarQube or in Artifactory, then hold yourself. If you need that data as input, try using the REST API given by that system to pull the data, instead of start storing the data on your side. Maintain the data closest to its source of truth.