Every company is a “Software Company”, “Software is eating the world”, in these similar lines I recently heard every company regardless of size it is a “Data Company”.
True, somewhere or other every organisation produces, consumes, analyses, report Data & makes a decision, promotes, buy, sell, acquire, expand, down-size, so on based on Data.
DevOps momentum has seen a rapid growth rate of new tools in the space of CI/CD, ARA(Application Release Automation), frameworks for enabling application delivery at pace.
When it comes to Continuous Delivery, modern architecture patterns, practices like Microservices, our delivery teams face challenges with data.
I want to discuss some of the challenges that I have gone through, ideas and some concrete pointers to help further to understand the “Data” problem in the DevOps space.
If we could go as fast as only our weakest link, then Data, Data Management, Data Architecture, and associated practices need our attention and love.
Who owns the Data?
Let us dive deep into the “Data” problem. The typical mentality in any Enterprise is that the “Data Team” or “Data Management Team” owns the data. There is a prevalent idea that they are the only team to protect data from unauthorised access, maintain the standards and conventions, “owning” the data and that they are the very last defence system in the organisation.
The question is, how did we get into this situation? The term “Silo” in DevOps hits me hard, as I see that the Data Team is the biggest silo in the entire Organisation and Value chain.
The problem, is it Data, the Data People or both?
In typical Object-Oriented programming style, I see that the “Enterprise Data team” inherits a lot from their predecessors. They inherit the process, standards, procedures, planning, execution, operations and management. The challenge is recognising and accepting the opportunity to improve, modernise, and simplify. i.e. apply lean principles. And they were making things developer-friendly to be the most significant barrier. Sometimes it is a mix of emotions for me — anger, sadness and pity.
There is misconception that the “old school” way of doing things is the “Best” way, and “DevOps” & “Continuous Delivery” is just for Applications not for Data.
If improving our daily work and the way we work is more important than the work itself, why we could not help them & take them along this DevOps journey?
We are responsible for our System Complexity
IMHO — The root of the problem for most organisations is:
- How do we think about data? Application, Operational, Analytical, Intelligence. This includes data from apps, log files, monitoring, performance, core systems, lake, files, message hubs.
- Struggling to understand the reality of Realtime vs Batch? and Impact of that in our business decisions
- Data in Business Events and the flow of that event data within Enterprise between different systems? How we move the data within the Enterprise between different producers & consumers?
- How to “Unlock” the data and make it available for right users, right use cases
- Data Constraints that we live with due to tools, process and practices
Let us honestly answer these following questions? My list is long, but let us start with this:
- How many of us know how many ETL jobs exist in our Enterprise? How many of them are still active?
- For storing scripts, Do we trust “Shared Network” drive than a “Source Control” system?
- Do we believe that we are sourcing the Data only one time from the “Source” system/systems?
- Do we have or provide clarity to our “Dev” community on Systems of Record vs Systems of Engagement?
Our System Complexities are because of our thinking, thinking particularly about the “Data”. We tend to compromise a lot as we could not get the “Data” in the right way, at the right time, in the proper format.
Sometimes these constraints, we impose on ourselves — classic example due to Data Governance, Data Security, Data Normalization, Data Centralization.
Microservices Era and Big Ball of “Data” mud:
If our systems distributed, why our Data is centralised? Centralising the data has benefits but is that the right way to build distributed systems/Microservices architecture?
We started thinking, doing about “Microservices” and distributed architecture patterns to simplify/break-down the monolith to enable “Enterprise agility”. Our attempts to apply this thinking to “Database” was not fruitful — either we did not pay attention or did not make any effort. The challenges we have, again the old habits come back and bite us.
Domain-Driven Design principles, Event-Driven Architecture — we can secure the support from the “Dev” community but not from “Data” — why?
- We believe that building massive databases even with modern “Enterprise-Grade Modern Database” tools, is the right thing to do.
- We believe that applying the same 30 years old naming conventions and standards to even modern databases, is the right thing to do.
- We believe that preventing the “data” access in Dev, Test from “Developers” and making them do TBD (Ticket Based Development) — is that right thing to do.
- We believe that not giving the “PROD” database “read” access to support folks and expect them to “Production Support” is that right thing to do.
Our understanding is changing, our approach towards these are changing; I see a bright future for the developer community to deal with data.
It is not easy, but it is not impossible.
1) Naming Conventions: I would put this into our “Developer” productivity, the reason, having a meaningful data model, schema, table, column names are important for developers and Ops to do the right work. Instead of referring multiple documents for names, acronyms, we need better meaningful naming standards and conventions. It is “Freedom” from those constrained naming conventions of history. If you Enterprise has inherited old historical conventions of limited characters, it is now time to free them up. #DeveloperProductivity #DataArchitecture
2) Domain, Boundary, Schema: The DDD education is a fun exercise with Data & Data modelling team. This is tough ask with our Data friends. Traditionally we have build monolith applications with monolith databases; In some cases, even our depended applications have/had database/schema level integrations. If we are breaking the monolith and taking the route of DDD, Bounded Context, then we need to move to the database/service boundary. Try this following with you Data Architecture, Data Modelling team:
a) Request for Database/Service Boundary (if this fails, try the next)
b) Request for Schema/Service Boundary
After a round of discussion with my Data friends, we agreed on the Schema/Service Boundary. We could able to limit the dependencies/cross-domain pollution with this approach. Service Account/Schema for read & write access. aligning the domain to a schema (at least)
3) Flow — Requesting Data model changes: This is another area we improved significantly. Initially, we needed to fill a SharePoint form to request a data model change. This goes to the DA (Data Analyst & Data Modeller), he produces the script, reviewed by DBA & by the Developer who requested, then applied to DEV Database and the dev starts developing the features. We realised that we could not fit this in our 2 — weeks sprint.
We could only go as fast as our weakest link.
There were some bold discussions & decisions around enabling the Devs to make the changes.
a) Educating Devs on standards
b) Enable Dev to local (dev) database
c) Enable DA to review this in parallel
d) Follow PR (pull request) for the entire lifecycle. So that we had the transparency, comments and communication flow visible to everyone. #DeveloperProductivity
4) Adopting modern practices: is Continuous Integration and Automated Deployment only for Applications Source code? What about Databases? This is where we had our breakthrough, and we helped our DBA to adopt to our Source Control systems. Instead of mailing the scripts pull the scripts from source control. Enable them to do automated database deployment. We did break the tradition of DBA’s logging in manually rolling out the changes, instead of automated deployment via pipelines.
5) Relational or NoSQL? — We like everybody (most of the Enterprise) thinking everything relational, forcing everything to relational. But some scenarios, we wanted to implement the “Document” database approach of treating “Entity” as document/graph instead of just a table that made of several columns. When our front-end is requesting via API a JSON payload, our database is serving relational table. Our radical thinking helped us to push the agenda for JSON schema & payloads. This helped the development team a great way as they could able to roll-out the changes quickly.
Measure — How do we measure our progress?
How could we fit the database changes for the feature within the sprint?
What kind of turnaround time for Database Changes with quality in mind?
How consistent our process — our application source code and database schema? Build, release pipelines and reducing the cognitive load for the team for doing different things for different components.
Learnings and advice:
Problems are not unique when it comes to some common themes; you are not the first one to solve. Many have already worked hard and found answers for your hardest questions. It is a matter of reaching out to help.
Spread and Share — You need a platform to share the learnings, experience with the rest of the organisation. Motivating them, help them to achieve more. Show & Tell- move away from PowerPoint presentation, show them the actual code, screen, work-in-progress.
Keep challenging: You are not alone, do not give up; This is more of continuous learning and education. Sometimes you need to take medicine over days to feel better or cure.
If you need to go faster, you need discipline. Help the teams, draw the lines, get them focused, and hear what they need to say.
Be sensible in Data architecture, design; Architect/design systems in such a way that we could do “Continuous Delivery.”