By Gaurav Sood.
In all the recorded history one common thing amongst any successful enterprise has been the presence of Data. It can well be argued that computer only came into existence some 70 years ago, but data was always present much like air and water. Data must be at the center of every business decision we take as business owners. In 2017 Economist published a report titled “The world’s most valuable resource is no longer oil, but data”. I feel data is much more powerful than oil. First, it’s never going to cease to exist, secondly, with time we are only going to produce more data. But in its unrefined form data is not of much use, so like oil, it must be refined and turned into insights that drive business decisions to make it a profitable entity.
Some of the biggest companies globally have used data towards targeted marketing to propel an idea or spread propaganda. The widespread disruption happening today is a result of all businesses moving towards a digital era. Leading companies in Norway today have digital and data mentioned in their strategies and yearly report to shareholders (for most industries and public sector using data better is a strategic focus area).
Keeping these factors in mind, it becomes essential to work on procuring the right data, transforming it into meaningful information and eventually deciphering the information into a business-related action. The usual way is to set up a Data warehouse (DWH) which will help in data storage, integration and feeding transformed data to decision-makers. However, this is easier said than done. Setting up a DWH comes with its own set of challenges like
- Figuring out the technology to use keeping in mind the competence available and size of the DWH
- Mapping all the sources and targets.
- Gathering all the business logic in one place.
- Data quality.
- Business dependency on the DWH determining the importance to keep it as updated as possible.
- Need for considerable Time, Effort & Cost.
and possibly many more.
An interesting way to put the DWH discussion in 1 phrase is “The DWH is dead, long live the DWH”. All major global digital enterprises(FB, Google, Netflix, Tesla, etc.) have a DWH of some sort but they need to be realistic, adjust governance levels and be more agile in their ways of working. DWH is usually associated with cumbersome and endless projects, long time to market and the endless need to try and create one model for the whole enterprise which is inherently almost impossible (thus not delivering on the promises).
A typical DWH lifecycle looks something like this:
Up until 2010, there were not many automation tools available in the market to accelerate the process of setting up a DWH. Businesses were heavily dependent on programmers to automate whatever part of the flow they could. Setting up a mid-sized DWH with data from 20 – 30 different sources and creating 15-20 reports could easily take anywhere between 4 – 10 months depending on the number and experience of the resources implementing it. This can be a long time for a business to start getting some return on investments.
Forrester defines Datawarehouse Automation (DWA) as “DWA is not a data warehouse appliance, nor data-warehouse-as-a-service (DWaaS) – it’s software that automatically generates a data warehouse by analyzing the data itself and applying best practices for DW design embedded in the technology. Another name for this type of technology is “metadata-generated analytics”.
The automation scenario has changed over the last decade or so with a multitude of DWA tools coming into the market. Some of the more established technology players like Microsoft, Google have launched their own DWA tools. There has been a spurt in the availability of automation tools from small startups. This has obviously led to fierce competition which is good for the end consumer as he will get the best product. DWA is just a collection of DWH best practices bundled into software to provide businesses with faster access to insights and their data.
Features of a DWA tool
- Simplified capture of the Data Warehouse Design.
- Automated Build (i.e. Generate Code and metadata)
- Automated Deployment of code to the Server
- Automated Batch execution of the ETL code on the Server.
- Automated Monitoring and Reporting of the Batch execution.
- Automated optimization of data loads (Parallel Vs Series).
- Metadata based active governance and control of your data.
- Agility in responding faster to the changing business needs.
In the past Data warehousing has taken too long and the results haven’t been too flexible. A small change or improvement could take up to weeks or months to be implemented. Amid this progression towards DWA, a lot of other options were tried like Big Data, Self Service BI etc. But a Data warehouse provides additional benefits like
- The ability to store history.
- Reduced risk of reliance on key individuals.
- Data augmentation.
Automation does not mean throwing out the concepts of Data Warehousing, in fact, it reinforces the same concepts with more focus on the execution of the Data Warehouse development.
DWA is often confused with Self-serve Data Preparation (SSDP). This is not entirely correct. SSDP is primarily meant for data scientists/data engineers working on specific use-cases. It is not meant to be used for Enterprise level DWH deployments. DWA and SSDP offer different features. Holistic/enterprise metadata control is not the same as building one simple (SSDP) pipeline in a cloud-based solution.
The main aim of these automation tools is to create solutions which make it possible for business users to access data, integrated from multiple sources and to prepare that data using drag and drop features and a simple, intuitive interface. They should be able to perform
- Data Preparation
- Test theories and hypotheses.
- Prototype test price points.
Most of the DWA tools available in the market are GUI based and you can set up a DWH with just a few clicks. Many of the existing DWA tools offer lineage functionalities as well as automated regression and quality tests, efficient loading routines, simplified deployments between environments and extensive generation of documentation.
A simple illustration of different functions covered by a DWA tool is in the image below.
With all the digital disruption happening around the globe, it is more important than ever to make sense of the abundant pools of data. Businesses need the ability to make smarter decisions at a click of a button. Traditional DWH methodologies and best practices must come together in building a data solution which can support the ever-changing business needs, hence the need for automation.