Why are we only talking about lakes, reservoirs, swamps and clouds?

The buzzwords and the endless possibilities that can be made in the cloud seems to be on everybody’s minds these days. Did we forget our core business?

At the breakfast table at the TDWI conference in February 2016 I met two gentlemen and we started talking about Big Data, cloud BI and the other buzzwords. They told me a story from their company.

Hadoop Cluster Quote

“One day my boss comes into my office and asked, how are we on big data? I had to ask him back, what do you mean? My boss asked again, how are we on big data? Do we have Hadoop? And I said No? Well, we need that, my boss replied and walked out the door. So now we have a Hadoop cluster in the cloud with little or no data and we really don’t know why we need it, but I’m sure my boss got his bonus!”

This is a true story, it is probably not very unique, and it is probably not a big deal at that company now, I am sure they found some use for their Hadoop cluster.

I am not saying that big data and the “new” ways of storing data and analyzing on data is useless, on the contrary, it is very important. But isn’t your old, core data as important?

The consulting companies tell us about their customers, and how they make use of sensor data and advanced stream analytics. But when we listen to these stories, do we actually get the full picture? I would say no!

All of the companies that have made use of the new technologies have already established a healthy and well operated data warehouse. This is where you start, and this is where it all should end – the data, that is.

Data Lake Quote

If you are planning on starting a big data initiative and you do not have control over your existing data, well, then I recommend you to wait and get control over
your existing data and see what they actually give you in analytic capabilities.

Gartner and others say that only 20% of your analytic power comes from big data initiatives, while the remaining 80% comes from your old data. And now we should get back to my first question, why are we only talking about the 20%?

The most obvious answer is that it is very cool. It is something new, something fun, something exiting and we just cannot wait to get our hands dirty in the new open source technology. I have installed Ubuntu and Hadoop on my computer, so I can actually learn it. “Self-Service”. It’s that fun!

Another reason can be that our consulting partners see the opportunity for short term business and oversell the simplicity of it. Not all of them do, but some tend to say that if you do not do it now, everybody else is going to do it, and you will be last at the finish line.

I would say that if you work in a company such as the two gentlemen I mentioned earlier, you are probably better off focusing on more important matters. And we also need to be the best advisors we can be for our management, so that they can make decisions based on facts and knowledge about this new data era, and not just order Hadoop because someone says it is smart.

And the most important reason, we actually need it. Cloud technology and various water metaphors are very important, and you can find a lot of gold, mining through the rubble. But if you have trouble finding the product margin on your merchandise, do not start with a data lake.

It makes sense to focus more on the 80% than on the 20%, but how can we make the 80% easier to handle? If we become more efficient in the way we handle our old data, then we can actually free up time to get our hands dirty with spark, pig, hive and other elephants.
So how to be more efficient with your existing data? There has been little change in how we do the ETL process. This is where BI or DWH developers use most of their time. Moving data from one place, transforming it and publishing it in a star schema ready for reports and analytics. As a saying goes, “if it ain’t broke, don’t fix it”. We do not have to fix anything, but if we can make the implementation time faster more efficient and well documented at the same time, shouldn’t we?
This is where a tool like Xpert BI comes in.

XpertBI Data warehouse automation tool

Xpert BI from BI Builders is data warehouse automation (DWA) at its best. It enables developers to efficiently move any data source into what is called an idealization layer, from where data is transformed and integrated, and finally put it in a star schema. This is done blazingly fast and with a lot less resources. This means that with a tool like Xpert BI, you have the power to release not only your data, but you can release your people too, to actually dive into the waters of even more fun stuff.

Submit a Comment

Your email address will not be published. Required fields are marked *

""