Motivation · Data First DevOps

Birth of DFD

DFD is not a hypothesis, its solution derived when hit by major issues and challenges in doing DevOps in a mid-scale project. In each scenario, a particular solution made best sense and eventually found that everything is working smoothly because instead of doing DevOps in traditional way, it was done more based on data and dealing with data was found more flexible and workable model.

Also it would be good if you look at the Limitations section.

Challenges

Here are few challenges:

A lot of Git Repos had been created and it was needed that at the creation of Repo it should be in deployable state, if code is added.
We had multiple projects to build in one stack and hundreds of projects to build in another stack.
Different project, due to some reasons, had different quality gates qualification.
There had been cases where it was asked to moved/migrated/propagated bunch of project deployment from one environment to another. Even if deployment of a project in an environment was automatic task, yet running that task, say 200 times, becomes a huge manual task.
Every project was being deployed multiple times in Dev environment by many developers on daily basis and tracking them was being difficult.
As the developer had access to the Git Repo, they could create branches as they want, and it was needed to have flexibility to build and deployed from any branch in few environments.
But QA and UAT environments had strict rule not to deploy anything from any other branch than development branch. Now identification of deployable and tracking it back to the branch of the Git was another challenge.
Related to #5, multiple deployment in QA environment was also happening. Even in UAT, one project gest deployed multiple time i.e. multiple version i.e. multiple builds i.e. multiple points from development branch, while there is active development.
Following a release process where on every release code version change was very difficult and a slow process and it was decided that whatever is accepted at UAT will be taken for release and release will happen only for Production environment. Doing release for every environment could cause a lot of delay and overhead to slow down the development team.
With #9 in place, doing a release for production was a challenge. Creating branch/tag on every release could unleash branch/tag hell for the CM team.
Number of environments were bound to increase i.e. PERF and Prod would eventually come up. Creating pipelines per environment was a smart approach.
Number of projects were obviously bound to increase with time. Though it does not look like a problem for parameterized pipelines. But imagine, the parameters are dropdown with project name and not Git URL.

Factors

Seeing the requirements, it was clear to us that we could not be writing some CI/CD pipelines which are dumb hardcoded pipelines. We were sure that we need to make the pipelines little more flexible, though we also had a solution to auto create pipelines once the GitHub repo is being created, but we realized that it would cause us to manage hundreds of pipelines in few months, which did not look like a good idea. So we tried focusing on the problem and tried to understand what all we are going to deal with:

Too many inter-dependent activities i.e. if X happens, we need to do Y as well.
Too many conditional execution i.e. if X is like this, then do A else do Y.
Too many dynamic parts e.g. new environments will keep coming, new quality gates might be introduced, etc.
Too many hidden desires of various team members e.g. dev lead wants to see who did the last deployment from which branch, QA team wants to track tested/approved/rejected builds etc.

Lack of context - Unnoticed Problem

In traditional DevOps implementations, we get almost everything automated without maintaining the context or storing the metadata of the automation. This is the biggest reason our DevOps is well functional but non-contextual. If you need to find out anything, you will have to go through multiple tools.

Just an example, if you need to find out about a given microservice (I am using microservice as example as it is most know term in today's world) deployment and its respective source code, you will have to start from Kubernetes to see which tag of the docker image is deployed and then you will have to go to Jenkins build logs, just in case they are still safe, to find out which build created that tag and from there you will get to know which branch and which commit it built.

Not that we never had that this context, when we were building the microservice or deploying it. Its just that we do not maintain the context and we don't realize what we can miss because of this ignorance.

Design Approach

So, when we started looking at all these factors collectively, we came up with a solution which in short, we could call - Data Drive, Modular and Well-orchestrated CI/CD pipelines. That translated into:

Have data created for everything and using these data for various next many purposes.
Creating modular pipeline i.e. one pipeline does only one work, as long as possible. In some cases, it is anti-pattern as well. So, it was a careful selection.
Writing orchestration jobs over modular pipelines.
In few cases, we wrote pipelines of modular jobs for reusability purpose.

Eventually, we also realized that silently Data Driven DevOps becomes the need of the day. Just to let you know, by market definition Data Driven DevOps is not same as Data First DevOps. We will talk about the relationship later.