Thinking
When you should not do DFD
Let's under this that DFD is nothing which is mandatory. It can be completely avoided. Maybe you decide to use it, yet it would make sense more and more, with the increasing size of the project. It does not mean it is meaningless for small projects, no it is not. If you have enough energy and time, do it whenever you can.
Sample scenario
For now, let's assume that your project is big enough, say 300 microservice/deployable and you have 5 environments, say Dev, QA, Performance, UAT and Production. Let's say you have 10+ scrum teams. This is what we will consider as not small and very good candidate for DFD.
I repeat, DFD can be implemented on any size of DevOps. But doing it on small size, might not give that good ROI. So, you can skip it in the beginning and implement it when you have enough time and resources.
Start thinking
Let me remind you - In DFD you see everything as data. So, keep this in mind because everything you need to see as data from now on.
First find out what are the items we need to work on:
Git repo creation is one thing. Just in case you think you can create one Git repo for more than one microservice, then you need to relook at the fact that microservices are independently releasable and deployable. Keeping such items in one repo is going against those fundamental concepts.
Creating a pipeline which can build the microservice and deploy it. So, consider build and deployment as another thing to look as data.
What about environment? Different environment would have different cluster. You might have different way for it, but for this experiment purpose let's say every environment has a different K8S cluster.
Here we are not considering Continuous Deployment, rather considering Continuous Delivery. Not that DFD is not applicable on Continuous Deployment. It's just for this study purpose we would consider Continuous Delivery. DFD can very well be extended to Continuous Deployment. Same data will be used, and same data will be generated. But there will be some interesting data usage will be there in case of Continuous Deployment. Let's continue with Continuous Delivery approach.
We have already listed few of the activities we will perform under DevOps process. Now let's see how to see it from Data First perspective. We will see what the data points for every element are.
Git Repo
- Microservice name
- Git URL
- Repo type (you won't have only microservices in your project)
- Who created
- When it got created
Build
- Who built
- When
- Which branch
- Which commit id
- Which build items are created
- Build status
- Build duration
Deployment
- Who deployed
- When
- What got deployed
- Which version
- In which environment
Environment
- Env name
- K8S cluster credential id
- K8S cluster name
- Created by
- Created On
Can you think of anything else ?
Approach
First you need to think about the data you want to or can capture about a property or process as input or output.
The Glitch
If you get tempted to duplicate the data which is stored in different systems like Git, SonarQube or in Artifactory, then hold yourself. If you need that data as input, try using the REST API given by that system to pull the data, instead of start storing the data on your side. Maintain the data closest to its source of truth.