Last Christmas I gave you Hadoop, but the very next day, you gave it away

Last Christmas I gave you Hadoop, but the very next day, you gave it away

I have spent some of the quiet Christmas nights in front of a burning fireplace reading articles about the future of the data warehouse. There are many opinions and arguments on how the future of data warehousing will be.
Two types of professionals are argumenting the pros and the cons in regards of technical architecture surrounding your data warehouse solution.
The technocrats make strong arguments in regards of specific technologies that will solve your challenges in the new data area.
On the other hand, you have professionals that are more of the old school and are skeptical to let technology drive the type of challenge you are going to solve. Let’s call them the conservatives.
Being a conservative myself it’s easy to point out what the technocrats do wrong, but are they as wrong as someone as me at times argue? (more…)

Are we heading towards chaos? Or am I just getting old?

Are we heading towards chaos? Or am I just getting old?

Last night I was attending an after work meet up where the topic was “Clash Of The Titans”. Microsoft, IBM, SAP and Oracle was presenting their BI and analytic solutions, both what they can offer today and how their future releases will be.

Write 500 lines of 500, pray and press F8Remembering back to the year 2000 when I started my first job as a data warehouse developer, we programmed SAS code without any intellisense on a dark blue background with a white font.

After writing 500 lines of code we said a small prayer and then we pressed F8. Usually we got lucky, other times we didn’t, and had to use the rest of the day finding that small typo or looking for that breach in logic that didn’t give us the result we wanted.

Yesterday, on the other hand, Microsoft just put a webcam up and pointed it at the audience and got instantly a facial reading on how happy the audience was, Oracle talked about their new mobile BI solution that is going to give you the most relevant reports depending on where you are, and what kind of meeting you were attending. IBM showed us how Watson just based on your data source made a dashboard and made suggestions about what you should look into, and SAP talked about how you magically could get all your BI needs just by scanning through your environment.

So what was I doing back in 2000? We didn’t have the technology that we have today of course, don’t think I even owned a webcam at that time. What we did was to Extract data, we Transformed and cleaned it, and Loaded it into a data warehouse. What struck me last night, was that none of the four “Titans” mentioned the data warehouse with a word.
Old and CrankySo given my background and my 17 years in the BI realm I’m starting to get afraid that I’m getting old and cranky and don’t understand the new things with Big Data, analytics and IOT, etc.
I seriously don’t think I’m neither old or cranky, my children would probably disagree, but I think we might be heading for chaos if we don’t structure our data before we report on them.

What the “Titans” were saying was that you can just use Power BI, Cognos or whatever reporting tool you have directly on your source and magically you’ll get wonderful dashboards and reports. What about the cleansing, the business rules and the mantra we have been talking about for decades “One single version of the truth”. Did we forget it? If we did, we seriously need to start remember it again.

My last blogpost was about much of the same things that I am writing about here, but it kind of worries me that we are bypassing the data warehouse. So the question is why aren’t we talking about it? Is it because it’s “old school” like me? Or is it because it is easier to sell a fancy reporting tool or the new exciting possibilities in the cloud?

The question should be, how can the old realm and the new realm co-exist?

My thoughts are that the Enterprise Data Warehouse still will exist and the Big Data initiatives will come as a supplement. I also believe that Microsoft, Oracle, SAP, IBM and the other big platform solution vendors know this – that for enterprise analytics and reporting supporting business decisions you need a data warehouse, dimensional modelling, one version of the truth etc., but they struggle to make “EDW” and “ETL” as sexy as Facial recognition and tweets.

You will still need to compare your revenue with comparable days, you still will need to see the development in product margin over time. It seems strange to put those data and implement those business rules in an unstructured environment.

Use your data lake for low level data so your analysts can use those data to analyze. And here are the analytic tools from the “Titans” excellent. Use them for data discovery and if you find some gold, implement that back to your data warehouse, implement the business rules and make reports. And, to make data warehousing sexy again – use automation tools to speed up the process.

In my next post I will try to dig deeper into how we can make our core business “sexy” again.

Why are we only talking about lakes, reservoirs, swamps and clouds?

Why are we only talking about lakes, reservoirs, swamps and clouds?

The buzzwords and the endless possibilities that can be made in the cloud seems to be on everybody’s minds these days. Did we forget our core business?

At the breakfast table at the TDWI conference in February 2016 I met two gentlemen and we started talking about Big Data, cloud BI and the other buzzwords. They told me a story from their company.

Hadoop Cluster Quote

“One day my boss comes into my office and asked, how are we on big data? I had to ask him back, what do you mean? My boss asked again, how are we on big data? Do we have Hadoop? And I said No? Well, we need that, my boss replied and walked out the door. So now we have a Hadoop cluster in the cloud with little or no data and we really don’t know why we need it, but I’m sure my boss got his bonus!”

This is a true story, it is probably not very unique, and it is probably not a big deal at that company now, I am sure they found some use for their Hadoop cluster.

I am not saying that big data and the “new” ways of storing data and analyzing on data is useless, on the contrary, it is very important. But isn’t your old, core data as important?

The consulting companies tell us about their customers, and how they make use of sensor data and advanced stream analytics. But when we listen to these stories, do we actually get the full picture? I would say no!

All of the companies that have made use of the new technologies have already established a healthy and well operated data warehouse. This is where you start, and this is where it all should end – the data, that is.

Data Lake Quote

If you are planning on starting a big data initiative and you do not have control over your existing data, well, then I recommend you to wait and get control over
your existing data and see what they actually give you in analytic capabilities.

Gartner and others say that only 20% of your analytic power comes from big data initiatives, while the remaining 80% comes from your old data. And now we should get back to my first question, why are we only talking about the 20%?

The most obvious answer is that it is very cool. It is something new, something fun, something exiting and we just cannot wait to get our hands dirty in the new open source technology. I have installed Ubuntu and Hadoop on my computer, so I can actually learn it. “Self-Service”. It’s that fun!

Another reason can be that our consulting partners see the opportunity for short term business and oversell the simplicity of it. Not all of them do, but some tend to say that if you do not do it now, everybody else is going to do it, and you will be last at the finish line.

I would say that if you work in a company such as the two gentlemen I mentioned earlier, you are probably better off focusing on more important matters. And we also need to be the best advisors we can be for our management, so that they can make decisions based on facts and knowledge about this new data era, and not just order Hadoop because someone says it is smart.

And the most important reason, we actually need it. Cloud technology and various water metaphors are very important, and you can find a lot of gold, mining through the rubble. But if you have trouble finding the product margin on your merchandise, do not start with a data lake.

It makes sense to focus more on the 80% than on the 20%, but how can we make the 80% easier to handle? If we become more efficient in the way we handle our old data, then we can actually free up time to get our hands dirty with spark, pig, hive and other elephants.
So how to be more efficient with your existing data? There has been little change in how we do the ETL process. This is where BI or DWH developers use most of their time. Moving data from one place, transforming it and publishing it in a star schema ready for reports and analytics. As a saying goes, “if it ain’t broke, don’t fix it”. We do not have to fix anything, but if we can make the implementation time faster more efficient and well documented at the same time, shouldn’t we?
This is where a tool like Xpert BI comes in.

XpertBI Data warehouse automation tool

Xpert BI from BI Builders is data warehouse automation (DWA) at its best. It enables developers to efficiently move any data source into what is called an idealization layer, from where data is transformed and integrated, and finally put it in a star schema. This is done blazingly fast and with a lot less resources. This means that with a tool like Xpert BI, you have the power to release not only your data, but you can release your people too, to actually dive into the waters of even more fun stuff.