The Answers
Why?
The short answer of Why is - To make the DevOps more capable and contextual, compared to usual standard based approach.
- Capable : To do more things that what we can do in traditional DevOps.
- Contextual : To do relevant things to the context of your needs. Netflix, Amazon, Microsoft, Facebook no one started with what they are today. They have matured to the current state. Their context and your context are not same and hence their approach and your approach can't also be same.
The details
If you look closely all the CI/CD we do, are hard coded. This is how:
- Get the code.
- Do the build and create the deployable.
- Check static code quality.
- Do security scan.
- Store the deployable.
But real life requirements might not be so straight forward and not so hard coded. There can be various complex scenarios:
- Don't deploy in to QA or higher environments if build is not from development branch.
- Don't deploy in Prod environment, if build is not from a release branch.
- Use relaxed quality gates for few code repositories.
- Use strict quality gates for other repositories.
- Don't trigger the deployment if spark job is not streaming job.
- Need to keep track of the deployment tag which will go for production.
These problems can't be handled by classical hard coded pipelines. Even if we use the DSL (Domain Specific Language) to code our pipelines, the way we code them, they are as good as configuration i.e. utilizing no power of coding. This is exactly why we must switch to DFD.
What?
The short answer to What is - It's the way to utilize the data and context around process or entity in DevOps.
The detail
Data First DevOps is an architectural pattern, which encourages not to see DevOps just as set of tools to do automation, but to look at it as whole system built using these tools, some datastore to keep data and some event driven processing or using any such patterns, if the need calls for.
You can find the details of what in the definition page.
How?
Answer to how can't be that short.
Utilization of context
We need to see the each automation as a contextual operation. Such contexts are to be seen in terms of data. It is needed that we can capture and use the context of the automation. Once we have the data/context in place, we can handle all the scenarios we just discussed about as real life complex project scenarios. Example: If we somehow store if the given spark job is streaming or batch (because that is the context about the spark job), it will be possible to implement the #5 in the above given list.
Utilization of power of coding
As in any application development, there are two parts: Data and Business logic. In the exact same way DFD, emphasizes on data(context) and business logic(coding). All the integration engines are now moved from configuration to coding. Generally all of them, has some kind of DSL in place. Jenkins provides very strong, Groovy based DSL to code the pipelines. Now if we have the power of coding and stored context about the automation, combining both gives us enormous power of control.
In the further chapters, we will discuss the same in detail.