If data is king, then why...?

July 4, 2020

We have often heard the term, Data is king!!, or Data is new gold!!. Most of the businesses today are running on data. It is no rocket science that as long as we have data and we know how to use it, we can do wonders. But the moment, it comes to DveOps, we all get blinkers.

Feel free to join the LinkedIn gourp for discussion and queries.

all we think about how good the automation can be so that with minimal intervention it reaches the production environment. I will commit the code, pipeline will get triggered, it will do build, unit test execution, bla bla bla and will get deployed. Suddenly our vision gets blinkers and we no more can see data.

Designing an application

If we are developing some application, we take care of few things no matter what.

The application data should be stored in some kind of data store.
Application logs should be collected for debugging and trend analysis.
In few cases, we also create audit logs, which are again some kind of data being stored in same or different data store.
Any transactional data is stored to its maximum details for analytics and ML purposes.
In few cases, we even collect click stream data to understand user behavior.

Why do we do all these? Because we know Data is new gold !! And we do what we can do to collect it and process it.

What next

Now we have all type of terminologies in the space of data. In terms of storage:

Datastore
Data warehouse
Data stream
Data lake, and what not.

In terms of processing:

ETL
Map Reduce
Hadoop
Spark
Sling
Flink
Lambda architecture and what not.

The Big Data itself a big thing itself.

Blinkers ON

Now let's talk about DevOps. Think data about DevOps. In most of the cases, you won't be able to visualize data. You would be thinking:

What data?
Where is data in DevOps processes?
Do you think we should start doing log aggregation of DevOps tools?
Oh, c'mon!! Don't over complicate the process. Data is not the purpose/intent of DevOps.

Even automation is not the intent of DevOps. Intent is speed, quality and minimal overhead. To do so, we use automation as a tool. Then why can't we use data as a tool, is my proposal. Suddenly now, data is no more important. Is it because the people who implement DevOps are not from the application background and hence they can't think of it? Or is it because you can't visualize data in DevOps and hence never feel the need and hence never bothered?

What advancement you comes in your mind? How can you make things more automated and more automated and more automated? In general the thought starts from automation and ends with automation, maybe with some variation on "How to" and "which tool".

Blinkers OFF

Data

Start seeing things in DevOps as data, states and events. Practice it.

Codebase as entity.
Commit as an event.
Build as an event and process.
That docker image as an entity.
Deployment as event.
Code promotion as event and state.

Once you change the perspective, you would start seeing data. Example - the code has been committed by X and build event got triggered automatically or on demand, who did it, how much time it took, which branch is being built, what's the commit id, what's the docker tag got created, the relationship between docker tag, build tag and branch, where it is deployed, what's the deployment history of the docker tag, which one reached production, how many of them could not make it to production, what's your code efficiency to reach production, does that indicate you rework a lot, can that be improved, is the code deployed in production not from main branch?

A whole new perspective gives you a whole new world to look at and a whole new way to rethink the DevOps.

Processing

Let's understand how DevOps today works. In most of the cases, there are two scenarios at high level:

Someone triggers the flow to build, deploy or test, etc.
Something happens and some flow gets auto triggered. Mostly implemented using some webhooks.

But we have the data available for various entities, events and states, the there would be something to write or read those data. So to do so, should we have a DAO layer kind of thing or some kind of reusable module or may be a small set of microservices. Do you see the possibility of event driven architecture in DevOps? Can you think of reusable DevOps components, which can be reused by various event or by a orchestration layer or by another components? Does that look like somewhere like Microservices?

We can have immense amount of possibilities once we change the system perspective from "Automation" to "Data First".

Conclusion

DFD opens up these possibilities for you, but the first step to look at is the data of your DevOps. The way we do about DevOps is to be re-thought and re-implemented. For DFD, we are planning to build a platform to convert these thoughts into implementation without being too much intrusive into your system. If you feel interested, please reach out to us on our LinkedIn group.