Being in the software business, one of our most important tasks is to let our customers know about our product. One of the ways we do that is to attend various conferences around the world. For two consecutive weeks in November we will be spend a lot of time on the conference carpet, in sessions, and in our hotel rooms.
First out is the PASS Summit in Seattle. This is the Microsoft SQL Server user based conference with a lot of great topics on both traditional SQL Server and data warehousing, but probably more topics on the new architecture and the new possibilities in Azure.
We are living in exiting times with regards to data strategy, data architecture and technology.
Since we are a Microsoft partner we need to both have an opinion and a strategy with respect to Azure. It is going to be very exciting to talk to the best SQL Server people in the world about both our current product and to lift the veil on the future of Xpert BI. So, if you are going to PASS, drop by booth K4 at the launchpad area and talk to us, we might have some nice swag to give away. And I promise you will have a great data strategy or data warehouse talk and of course a demo of the best DWA tool on the marked.
While I am writing this, I am trying to come up with a topic for a ten-minute speed talk on a Norwegian conference coming up in October. I think my topic will be something like, if everyone is a data scientist, who is going to do the ETL? It is still is a mystery to me that people are skeptical of doing DWA on their ETL so that the road to the data scientist role gets shorter. I guess one of our goals on the PASS Summit is to convert some of the manual ETL developers to see that DWA can be a good thing, and not only yet another costly software we have to learn.
After we say goodbye to the space needle we fly directly to Barcelona to attend the Gartner Symposium.
The Gartner Symposium is a bit different from the PASS Summit, where the PASS Summit gather the SQL Server nerds from all over the world, the Gartner Symposium is more a C level conference. Our focus here will be to show the great benefits of investing in our software to enable not only your data warehouse but also your digitalization strategies.
Anja, our head of project implementation and co-founder of BI Builders, is going to talk about how your “old” technology can co-exist with the more “modern” ways of modelling or storing your data (Please, pay attention to the quotation marks).
BI & Analytics – What will be your Fit for Purpose solution?
Does the introduction of new technologies mean your current toolsets are obsolete, and will they be able to co-exist?
BI Builders will discuss the impact of the changing size and content of data in organizations regarding reporting, analytics and fact-based decision-making.
I am confident this is a hot topic for most of both the BI and analytic leaders and the CIO and CTO’s attending the conference.
So, as much as we hope this will be a great way to both meet new customers and partners from around the globe, we also hope we get to learn something from all the attendees and other conference partners’ as well.
It is you that make our product and our advising better by letting us learn from what you do. At the same time, we hope that we will be able to inspire the attendees into doing things smarter, cheaper and faster.
On August 10th IKT-Norge, Visma and Rambøll presented their 10th Annual report called “IT i Praksis”. It’s a survey meant to look at the maturity of practical IT use mainly in the public sector of Norway. It’s a huge report and I won’t cover all the topics here but there were a couple of things that stood out. The survey really boils down to the word digitalization. How can the public sector move faster so their users get the best digital experience possible?
This isn’t just a challenge for the public sector. It’s the same challenges if you are a retailer, a bank or really any other business where you need to connect with users in a digital way. There is one significant difference though, the ability to organize and the funding.
The Norwegian public sector consists of 426 municipalities which in a sense is 426 small to mid-size companies with their own servers, their own systems and their own technical debt. But all of them have the same service level agreement to their users. The lack of standardized systems has over time, as with many other companies in other industries, created a lot of technical debt. Most of us have an old legacy system from the eighties, and most of us have a Microsoft Access database that no one understands anymore, because the person who made it left 10 years ago.
As I said, this happens in all industries and is not only a challenge in the public sector.
But to further add to the challenge, in the public sector it is often a political decision in regards to the amount of money you can spend on IT. And as we all know, it is easy not to spend your limited resources on upgrading an old system that works if you don’t touch it.
The second challenge the public sector is experiencing is that in 2020 there will only be 370 municipals in Norway. That means that many of the municipals are going to be merged together. And when two municipals are merged, what are you going to do with that Access database and that legacy system from the eighties?
The survey talks about organization, leadership and the competence of the leaders, as well as the maturity level in regards to new technologies.
When they presented the report, they said the public sector is like a big tanker at sea. Its moves steadily towards the target, but it doesn’t go very fast, and it’s a huge ship to turn. But in all fairness, it has started its journey and it is going to go faster and faster.
One of the panel debate members was an IT director from one of the municipals in the middle part of Norway. She said their biggest challenges were that there was no standard software to use, and the lack of integration between the software that they do have.
This is our challenge as software vendors to fix, we need to build software that provides complete solutions and not help our customers to create silos.
One other thing in the debate was that one of Norway’s biggest vendors in the public sector said that this will get much better when all the systems run in the cloud. This was one of the stranger things I heard during the presentation. The cloud doesn’t fix integrations? In my experience, it does the opposite. But what is right is that the issue with standards are easier with cloud applications.
So how would the public sector go about making the tanker a bit more streamlined and easier to turn?
Well, first of all, you have to invest money in knowledge. If you want innovation you need innovative people. And you are probably stuck with the people you have, so you must invest in improving their skills. If you don’t, you’ll continue to make systems that looked shiny in the eighties.
For the public sector, you need to work closer together, you will have a lot more leverage if you combine your efforts against the software vendors. They do not want to lose you as customers, never forget that you have more power than they do. And you need to automate and work smarter in areas where that is possible.
You need to draw a data strategy with the intention of making it possible to change your legacy systems one by one. You need a data strategy that doesn’t involve lock-in with one particular vendor.
The importance of a data strategy is hard to emphasize enough. If you don’t have a data strategy you will eventually make silos with data that is hard to reuse.
Many vendors will in their sales presentation oversell the ability to integrate both to and from that system. But it is rarely that easy. And the cloud doesn’t solve any integration challenges.
So again, since I work in a company that probably makes the best self-service data preparation tool on the market, investigate how your data strategy can be implemented with the use of Xpert BI. I promise you that you will get to your goal much faster and with your solution already documented when you are done.
There are of course many challenges that is pointed out in the survey, but this was one of the clearest challenges I could see.
Hope everyone is having a great summer, and I’ll write a new post soon.
(I’m a poet and I know it)
I have been spending a lot of time warning people about the pitfalls, or rather deep ends, of the Data Lakes and Big Data initiatives. First, I want to state that I think the Data Lake as a concept is a great idea, but it needs to co-exist with your more traditional data architecture.
So, what I’m discussing in this post is my take on the total data architecture as I see it in today’s data age.
We all know the data warehouse mantra “One single version of the truth” or the variance “One single source of the truth”. The single source statement will be even less relevant with the coming of Big Data. But the one single version of the truth is probably even more important now in the big data age. You need one place to implement your business rules, and place to contain the truth. What we see now is that the place for holding the truth is not always in the same place.
Let’s start by identifying the data source components of your data architecture.
“Traditional” or “small” data
This is your on-site systems, like your ERP system, your self-made operational side system or your manual master-data spreadsheets. If you start to look at your system portfolio you would probably find a lot of data that would fit into an enterprise data architecture. Here you will find most of your reporting basis, and most of your analytical basis. I would say that this your most valuable data, and hence this is where you should spend most of your resources preparing and manage. Whether the data resides on-site or on a server in the cloud, the extraction method is the same if you have a connection to the actual database.
From a data collection view I have mixed experiences with SaaS solutions. I have countless times run into trouble when trying to get data from the various SaaS providers. What you get when you buy or rent a SaaS solution is an application that serves its operational purpose, but you don’t get direct access to your own data. More and more providers let you access your data through API’s, which is a good solution, but in my experience, there are still a lot of smaller vendors that rely on flat file integration. This means that you must make a manual flat file integration at your side and store (possibly sensitive) data in open flat files on ftp servers. In some cases, you even must pay extra to get access to your data.
With the coming of new technology, we have new faster, cheaper and smarter ways to save more of the unstructured and semi structured data. Depending on your line of business one could argue that this is where your data “gold” is. Sensor data from various IOT systems, web logs you can store in detail and other fast data that you have the possibility to analyze as they come in. This is the area where traditional data warehouse architecture gets disrupted. It’s not given that you should model this data or implement it in your star schema. Also, this is where the ETL process must be adapted to whatever technology you choose to store your data in.
So, let’s talk about the Big Data perspective. Do we still need relational databases for modelling our data hub? Do we need to buy expensive in-memory databases when we can store and access data much cheaper and faster with for example HDFS? The answer to these questions are neither yes nor no. The answer solely depends on how your business is set today.
I have one thought regarding moving all your data storage to HDFS, the technology is still too inaccessible. The toolsets are yet not good enough, and there is a lack of professionals that know them. This will, of course, as with every new emerging technology, improve as time goes by.
With the data lake methodology, we have got a term that says something about how and where we should store the new “Big” data. CTO James Dixon at Pentaho coined the term as a contrast to the data mart to solve the problem of “information siloing”. As time has gone by, more and more vendors have adapted the term, and it has become an architectural necessity when planning your enterprise data architecture. The main thing we should remember about Data Lake is that Data Lake is a methodology and not a technology. Many people believe that Data Lake equals Hadoop, but this is not accurate.Data Lake is a methodology that can contain multiple technologies, and you should use the technology that fits your need, or even better, fits your data’s need.
This means you can have some data in Hadoop, some on Azure and some stored in a MongoDB. And, together, this is your Enterprise Data Lake.
I have seen presentations from companies that offload all their data into a data lake and store everything in Hadoop. Where they also set the business rules and make their reporting marts in Hadoop. So, it is an option to bypass the whole relational data warehouse, but as I said earlier the technology, as I see it, is still too immature that this is the smart move now.
At the Gartner Data and Analytics Summit in London earlier this year I attended a session where Edmond Mesrobian, CTO at Tesco and Stephen Brobst, CTO at Teradata talked about the data architecture at Tesco. I come from a fairly big Norwegian retailer, but what they talked about was almost science fiction to me. I think they had every technology you could think of in their architecture slide. They have made groundbreaking work at Tesco in regards of the new data architecture. One of the things that I noticed with interest was that they still have their EDWH intact in their architectural slide. They said that the EDWH will never disappear, but they will not make a new one.
So even at Tesco, that has the muscles and manpower to embrace every technology that they want, they still undestand the value of structuring the data in a data warehouse for reporting.
They also said that its necessary to enrich the EDWH with relevant data from the data lake, but not everything needs to be modelled into your star schema. Some of the data could just be offloaded in a raw format so the analysts had easier access to the data.
Being a Norwegian Data Warehouse Architect most of my projects have been around Norwegian customers. I often see that many of the companies that promote Big Data technologies and methods, compare Norwegian companies with large US companies. This often leads to, sadly, overselling technologies and methods fitted for much larger enterprises. In Norway, there are few, if any, companies that are as big as Tesco or Wall Mart or Bank of America.
So, if you are a small or mid-size company, you should choose an architecture that fits your needs. You must consider pricing models for the new technology. It’s not certain that you will get ROI on your cloud database. You must see if you can get qualified personnel to utilize the technology, and not rely on one single consultant. And, most important, you must look at the data you have or the data you are planning to get. Only after taking these steps can you start to choose your architecture and technology.
So how do I see the architecture for mid-size to big corporation in Norwegian standards? Well I’m really glad you asked…
The picture below describes in a helicopter perspective how I believe the best architecture for a “Norwegian-sized” company should look like.
Just a quick note, all the vendor and technology names are meant as examples!
This picture sums up most of what I have talked about in this post. This is how I see the co-existence with the traditional data warehouse and the new emerging technologies.
We often in these architecture slides forget to include the SaaS solutions and getting data from them. Make sure you have an easy way of reading from API’s and storing them in your data hub.
If your SaaS provider doesn’t support API’s, my strongest recommendation is to use a different provider. This is the best way to force the various providers to implement it.
When we talk about exposing data, think about your data warehouse as a data provider too, so you also need to expose your data through API’s. You must do this so your web pages or other systems have an easy way of getting the correct information.
There are of course also scenarios where you can expose data from your data lake. When to expose your data from the lake or your data hub depends on the nature of the data. If it is live stock count from stores and you have that data in your data lake you should of course expose it from the lake, but if the data are run in batch over night or even by the hour you probably should expose them from your data warehouse.
Regarding your traditional data from your ERP systems or other systems where you have direct ownership to the database, you will, as I have pointed out before be best utilized by an automation tool. I am not only saying this because I work for a company that is developing, probably, the best automation tool on the market. But I am also saying this because you need to free up time for your developers to get them up to speed on the emerging technologies. Time is one of our most valuable assets, so use that asset the best way you can.
As I am writing this our brilliant developers are busy implementing data collection to and from other new technologies, and not only your traditional data. But I will get back to the details regarding this in a little while.