It doesn’t matter which part of the data journey you find yourself on, it is never a bad time to start thinking about DataOps! In this post I want to uncover DataOps, what it is, what it means for your business and how to implement it. If you are new to DataOps, or simply want to digest our take on a familiar concept, read on!
When thinking about DataOps I prefer to describe it with the following statement:
Any approach to Data, in any business, should always be product-led, and DataOps encapsulates a methodology to deliver on its trio of core principals:
1. Quality
2. Pace
3. Product
To deliver high-quality code and data products at pace, we look to implement Modern Software Engineering processes and an Agile Delivery Methodology across Value Focused and Cross-Functional Teams within an organisation. These three pillars work together to deliver on the core principals of DataOps, allowing the business to constantly iterate and continue delivering valuable data products, quickly and efficiently.
At this point you would be forgiven for asking yourself the question:
You would not be the first, as it is one of the most common questions that is asked when discussing DataOps. However, I believe that it couldn’t be further from the truth. Whilst the ideas of DevOps primarily focus on software development, ensuring that software engineers are able to deliver high-quality code quickly and efficiently, DataOps needs to be much more multi-faceted for the following reasons
• The nature of data within a business is often fluid and unpredictable, it is always changing and evolving to keep pace with business requirements.
• This fluidity of data often leads to highly complex systems, with a variety of data sources spread across multiple tools and services, often both on-premises and in the cloud.
• As a result, a wide range of skill sets are required to deliver data across the business, from software and data engineers to build and administer platforms through to data analysts and scientists who deliver insight and analysis.
• Data must be use-case led, driving business value through the delivery of specific data products.
So you see, DataOps must be so much more than just DevOps for Data, needing to support a fluid and unpredictable data landscape for a variety of different business users.
How should you implement the three pillars to support successful DataOps? Let’s dive into some ideas on how to do just that.
This is the principal that is most closely aligned with DevOps, and we seek to extend best practices such as good code management, version control, automated testing and continuous integration and deployment (CI/CD) pipelines to the data platform.
Whilst these practices are perhaps best suited to software and data engineers, data analysts and scientists can also benefit by streamlining their own workflows through the use of regular reviews and well-organised documentation. It is also perhaps the most simple to implement, with well established industry standard tools such as GitHub, GitLab, BitBucket and Azure DevOps, available at reasonable cost.
It is important to emphasise that implementing successful DataOps is not always about implementing all the technology that established DevOps tools have on offer. Not everyone needs all aspects of a fully automated CI/CD pipeline for instance and it is important to establish the right sized solution for your organisation. Look to understand any bottlenecks in current processes and identify where the greatest benefits can be realised before focusing your attentions.
We have all heard about working Agile, running teams in regular sprints with daily stand-up meetings and monthly retros. It is an approach that works really well for operating teams focused on delivering and iterating data products at pace and as such forms a central part of DataOps. By working in an agile way, the team remain focused on Innovation – quickly delivering a proof of concept before building a minimum viable product (MVP) to demonstrate value.
If successful this product can be scaled and industrialised by adding into the data factory, all handled in managed two-week sprints.
The lean manufacturing process, employed by the manufacturing industry, is often described as playing a key role in delivering DataOps successfully[1].
Statistical Process Control (SPC) is the practice of using statistical methods to evaluate testing, validation and error capture and was used to great success by the Japanese automotive manufacturers after World War II. It translates well to the DataOps process, where the regular measurement and evaluation of processes and team performance is crucial to ensuring success.
A successful data platform thrives on trust, and confidence in its processes and team members drive enthusiasm for data and the value that it can deliver.
It is often easy for teams, and the data they are responsible for, to become siloed, only focusing on serving a particular capability. This can limit productivity and compromise the speed of delivery, with teams quickly becoming blocked by others. DataOps champions collaboration between teams, sharing knowledge and skills to ensure that the value always remains the key focus of any data product.
Teams that were once capability focused become outcome focused, working together with the collective goal to deliver value through data.
Now we have understood a little more about what DataOps is and how it can be implemented, you may still be thinking:
DataOps is all about embracing a new way of working to deliver quality data products quickly and efficiently, but what are the tangible “value-adds” that your business stands to achieve by successfully implementing DataOps?
By now, hopefully you understand the added value that successful DataOps can bring to your organisation. However, depending on where you are in your data journey, this might require significant investment to ensure success. Why invest in DataOps now and is there any value in waiting?
• We’ve referred to DevOps a lot throughout this post, and with good reason, DataOps seeks to implement many of the principles that makes DevOps so valuable to a business. Fortunately, DevOps is now well established across the industry, with a variety of fantastic tools including GitHub, GitLab, BitBucket and Azure DevOps to support it. It’s never been so simple to implement DevOps and as a result this is the perfect time for DataOps to build on those foundations.
• As data becomes more embedded into the modern business, its demands for insight and analysis begin to change. What was once an appetite for daily dashboards and reports has now evolved into a demand for more sophisticated data outcomes, involving complex data analysis and data science. Data teams need to adapt, and DataOps provides the framework to support these changes at pace.
• Is your data platform fragmented, with different business units operating with siloed data sources? This is a common problem that is introduced when a business proliferates and can quickly get out of hand, especially when migrating to cloud based software as a service (SaaS) By implementing DataOps before this occurs we keep a firm handle on data quality and make sure that data use remains use-case driven, avoiding the chance for data to become siloed.
So thats our take on DataOps; what it seeks to achieve and why it can be valuable to you as an organisation.
If you’d like to learn more about how this can help you and your business watch our On-Demand Webinar: Increase Agility with DataOps, or drop hey@cynozure.com a line.
[1] A discussion on DataOps by DataKitchen – What is DataOps? | DataKitchen
[2] The National Institute of Science and Technology (NIST) published a report in May 2002 (Microsoft Word – 7007-011 FR Complete.doc (nist.gov)) reporting the costs associated to fixing defects that occur at different stages of the development cycle. The approximate value quoted has been calculated assuming a cost of $75 per hour to address various bugs multiplied by the hours they take to fix.