In all the recorded history one common thing amongst any successful enterprise has been the presence of Data. It can well be argued that computer only came into existence some 70 years ago, but data was always present much like air and water. Data must be at the center of every business decision we take as business owners. In 2017 Economist published a report titled “The world’s most valuable resource is no longer oil, but data”. I feel data is much more powerful than oil. First, it’s never going to cease to exist, secondly, with time we are only going to produce more data. But in its unrefined form data is not of much use, so like oil, it must be refined and turned into insights that drive business decisions to make it a profitable entity.
Some of the biggest companies globally have used data towards targeted marketing to propel an idea or spread propaganda. The widespread disruption happening today is a result of all businesses moving towards a digital era. Leading companies in Norway today have digital and data mentioned in their strategies and yearly report to shareholders (for most industries and public sector using data better is a strategic focus area).
Keeping these factors in mind, it becomes essential to work on procuring the right data, transforming it into meaningful information and eventually deciphering the information into a business-related action. The usual way is to set up a Data warehouse (DWH) which will help in data storage, integration and feeding transformed data to decision-makers. However, this is easier said than done. Setting up a DWH comes with its own set of challenges like
Figuring out the technology to use keeping in mind the competence available and size of the DWH
Mapping all the sources and targets.
Gathering all the business logic in one place.
Business dependency on the DWH determining the importance to keep it as updated as possible.
Need for considerable Time, Effort & Cost.
and possibly many more.
An interesting way to put the DWH discussion in 1 phrase is “The DWH is dead, long live the DWH”. All major global digital enterprises(FB, Google, Netflix, Tesla, etc.) have a DWH of some sort but they need to be realistic, adjust governance levels and be more agile in their ways of working. DWH is usually associated with cumbersome and endless projects, long time to market and the endless need to try and create one model for the whole enterprise which is inherently almost impossible (thus not delivering on the promises).
A typical DWH lifecycle looks something like this: Up until 2010, there were not many automation tools available in the market to accelerate the process of setting up a DWH. Businesses were heavily dependent on programmers to automate whatever part of the flow they could. Setting up a mid-sized DWH with data from 20 – 30 different sources and creating 15-20 reports could easily take anywhere between 4 – 10 months depending on the number and experience of the resources implementing it. This can be a long time for a business to start getting some return on investments.
Forrester defines Datawarehouse Automation (DWA) as “DWA is not a data warehouse appliance, nor data-warehouse-as-a-service (DWaaS) – it’s software that automatically generates a data warehouse by analyzing the data itself and applying best practices for DW design embedded in the technology. Another name for this type of technology is “metadata-generated analytics”.
The automation scenario has changed over the last decade or so with a multitude of DWA tools coming into the market. Some of the more established technology players like Microsoft, Google have launched their own DWA tools. There has been a spurt in the availability of automation tools from small startups. This has obviously led to fierce competition which is good for the end consumer as he will get the best product. DWA is just a collection of DWH best practices bundled into software to provide businesses with faster access to insights and their data.
Features of a DWA tool
Simplified capture of the Data Warehouse Design.
Automated Build (i.e. Generate Code and metadata)
Automated Deployment of code to the Server
Automated Batch execution of the ETL code on the Server.
Automated Monitoring and Reporting of the Batch execution.
Automated optimization of data loads (Parallel Vs Series).
Metadata based active governance and control of your data.
Agility in responding faster to the changing business needs.
In the past Data warehousing has taken too long and the results haven’t been too flexible. A small change or improvement could take up to weeks or months to be implemented. Amid this progression towards DWA, a lot of other options were tried like Big Data, Self Service BI etc. But a Data warehouse provides additional benefits like
The ability to store history.
Reduced risk of reliance on key individuals.
Automation does not mean throwing out the concepts of Data Warehousing, in fact, it reinforces the same concepts with more focus on the execution of the Data Warehouse development.
DWA is often confused with Self-serve Data Preparation (SSDP). This is not entirely correct. SSDP is primarily meant for data scientists/data engineers working on specific use-cases. It is not meant to be used for Enterprise level DWH deployments. DWA and SSDP offer different features. Holistic/enterprise metadata control is not the same as building one simple (SSDP) pipeline in a cloud-based solution.
The main aim of these automation tools is to create solutions which make it possible for business users to access data, integrated from multiple sources and to prepare that data using drag and drop features and a simple, intuitive interface. They should be able to perform
Test theories and hypotheses.
Prototype test price points.
Most of the DWA tools available in the market are GUI based and you can set up a DWH with just a few clicks. Many of the existing DWA tools offer lineage functionalities as well as automated regression and quality tests, efficient loading routines, simplified deployments between environments and extensive generation of documentation.
A simple illustration of different functions covered by a DWA tool is in the image below.
With all the digital disruption happening around the globe, it is more important than ever to make sense of the abundant pools of data. Businesses need the ability to make smarter decisions at a click of a button. Traditional DWH methodologies and best practices must come together in building a data solution which can support the ever-changing business needs, hence the need for automation.
Back in 2007 when I first started working with ETL processes and building Data warehouses, it was a steep learning path. Setting up a Data Warehouse (DWH) required plenty of planning and project management before the actual development could start. There were various steps that lead to a functional DWH capable of servicing the business needs.
To name a few
Create Source to Target Mappings including business rules and logic.
Create a naming convention to be used throughout the DWH.
Document the business logic as you go on building the DWH.
Generate and store Surrogate Keys.
Decide on what goes in SCD 1 and SCD 2.
Design the architecture, whether it would be Star or Snowflake or just some form of denormalized data.
Deciding on the load frequency.
Finally loading it into target data model.
It is estimated that in a normal Business Intelligence (BI) project, close to 80% of the time and budget is spent on setting up the DWH. It is important to understand here that a solid DWH architecture and design sets up your entire BI project for success. Any critical failures or misunderstandings in the design and architecture of DWH can have serious business consequences. Considering these factors automating your DWH implementation is a step that every company would like to invest in.
At BI Builders we have just the right product for your DWH automation – Xpert BI. It’s a culmination of best practices in DWH implementation gathered over the years by developing DWH solutions on premise and in cloud. Xpert BI integrates all your data, from local files, complex systems to cloud applications, in a central information platform, thereby empowering the team to produce actionable insights at a quicker pace than before. You can choose your own infrastructure, Xpert BI supports both on premise and cloud instances. It is certified for Microsoft Azure and is also available in the marketplace. It generates standard SQL code in the backend, so it’s easy to debug.
Here is how Xpert BI delivers faster implementation of a DWH.
Xpert BI is a one stop solution for implementing
New DWH from scratch.
Creating a Data Mart on top of/alongside an existing DWH to support a specific business area.
Exporting on premises data to the cloud, with Export Groups (Exporting data from an entire source system to Azure Data Lake can be done in a few clicks)
Documenting SSAS models and exposing them to business users.
We are working round the clock at BI Builders to improve the product and come up with connectors to make your data integration as smooth as possible.
Check out our website https://bi-builders.com/ to know more about the product, available data connectors and read customer success stories.
Being in the software business, one of our most important tasks is to let our customers know about our product. One of the ways we do that is to attend various conferences around the world. For two consecutive weeks in November we will be spend a lot of time on the conference carpet, in sessions, and in our hotel rooms.
First out is the PASS Summit in Seattle. This is the Microsoft SQL Server user based conference with a lot of great topics on both traditional SQL Server and data warehousing, but probably more topics on the new architecture and the new possibilities in Azure.
We are living in exiting times with regards to data strategy, data architecture and technology.
Since we are a Microsoft partner we need to both have an opinion and a strategy with respect to Azure. It is going to be very exciting to talk to the best SQL Server people in the world about both our current product and to lift the veil on the future of Xpert BI. So, if you are going to PASS, drop by booth K4 at the launchpad area and talk to us, we might have some nice swag to give away. And I promise you will have a great data strategy or data warehouse talk and of course a demo of the best DWA tool on the marked.
While I am writing this, I am trying to come up with a topic for a ten-minute speed talk on a Norwegian conference coming up in October. I think my topic will be something like, if everyone is a data scientist, who is going to do the ETL? It is still is a mystery to me that people are skeptical of doing DWA on their ETL so that the road to the data scientist role gets shorter. I guess one of our goals on the PASS Summit is to convert some of the manual ETL developers to see that DWA can be a good thing, and not only yet another costly software we have to learn.
After we say goodbye to the space needle we fly directly to Barcelona to attend the Gartner Symposium.
The Gartner Symposium is a bit different from the PASS Summit, where the PASS Summit gather the SQL Server nerds from all over the world, the Gartner Symposium is more a C level conference. Our focus here will be to show the great benefits of investing in our software to enable not only your data warehouse but also your digitalization strategies.
Anja, our head of project implementation and co-founder of BI Builders, is going to talk about how your “old” technology can co-exist with the more “modern” ways of modelling or storing your data (Please, pay attention to the quotation marks).
BI & Analytics – What will be your Fit for Purpose solution?
Does the introduction of new technologies mean your current toolsets are obsolete, and will they be able to co-exist?
BI Builders will discuss the impact of the changing size and content of data in organizations regarding reporting, analytics and fact-based decision-making.
I am confident this is a hot topic for most of both the BI and analytic leaders and the CIO and CTO’s attending the conference.
So, as much as we hope this will be a great way to both meet new customers and partners from around the globe, we also hope we get to learn something from all the attendees and other conference partners’ as well.
It is you that make our product and our advising better by letting us learn from what you do. At the same time, we hope that we will be able to inspire the attendees into doing things smarter, cheaper and faster.
On August 10th IKT-Norge, Visma and Rambøll presented their 10th Annual report called “IT i Praksis”. It’s a survey meant to look at the maturity of practical IT use mainly in the public sector of Norway. It’s a huge report and I won’t cover all the topics here but there were a couple of things that stood out. The survey really boils down to the word digitalization. How can the public sector move faster so their users get the best digital experience possible?
This isn’t just a challenge for the public sector. It’s the same challenges if you are a retailer, a bank or really any other business where you need to connect with users in a digital way. There is one significant difference though, the ability to organize and the funding.
The Norwegian public sector consists of 426 municipalities which in a sense is 426 small to mid-size companies with their own servers, their own systems and their own technical debt. But all of them have the same service level agreement to their users. The lack of standardized systems has over time, as with many other companies in other industries, created a lot of technical debt. Most of us have an old legacy system from the eighties, and most of us have a Microsoft Access database that no one understands anymore, because the person who made it left 10 years ago.
As I said, this happens in all industries and is not only a challenge in the public sector.
But to further add to the challenge, in the public sector it is often a political decision in regards to the amount of money you can spend on IT. And as we all know, it is easy not to spend your limited resources on upgrading an old system that works if you don’t touch it.
The second challenge the public sector is experiencing is that in 2020 there will only be 370 municipals in Norway. That means that many of the municipals are going to be merged together. And when two municipals are merged, what are you going to do with that Access database and that legacy system from the eighties?
The survey talks about organization, leadership and the competence of the leaders, as well as the maturity level in regards to new technologies.
When they presented the report, they said the public sector is like a big tanker at sea. Its moves steadily towards the target, but it doesn’t go very fast, and it’s a huge ship to turn. But in all fairness, it has started its journey and it is going to go faster and faster.
One of the panel debate members was an IT director from one of the municipals in the middle part of Norway. She said their biggest challenges were that there was no standard software to use, and the lack of integration between the software that they do have.
This is our challenge as software vendors to fix, we need to build software that provides complete solutions and not help our customers to create silos.
One other thing in the debate was that one of Norway’s biggest vendors in the public sector said that this will get much better when all the systems run in the cloud. This was one of the stranger things I heard during the presentation. The cloud doesn’t fix integrations? In my experience, it does the opposite. But what is right is that the issue with standards are easier with cloud applications.
So how would the public sector go about making the tanker a bit more streamlined and easier to turn?
Well, first of all, you have to invest money in knowledge. If you want innovation you need innovative people. And you are probably stuck with the people you have, so you must invest in improving their skills. If you don’t, you’ll continue to make systems that looked shiny in the eighties.
For the public sector, you need to work closer together, you will have a lot more leverage if you combine your efforts against the software vendors. They do not want to lose you as customers, never forget that you have more power than they do. And you need to automate and work smarter in areas where that is possible.
You need to draw a data strategy with the intention of making it possible to change your legacy systems one by one. You need a data strategy that doesn’t involve lock-in with one particular vendor.
The importance of a data strategy is hard to emphasize enough. If you don’t have a data strategy you will eventually make silos with data that is hard to reuse.
Many vendors will in their sales presentation oversell the ability to integrate both to and from that system. But it is rarely that easy. And the cloud doesn’t solve any integration challenges.
So again, since I work in a company that probably makes the best self-service data preparation tool on the market, investigate how your data strategy can be implemented with the use of Xpert BI. I promise you that you will get to your goal much faster and with your solution already documented when you are done.
There are of course many challenges that is pointed out in the survey, but this was one of the clearest challenges I could see.
Hope everyone is having a great summer, and I’ll write a new post soon.
I have been spending a lot of time warning people about the pitfalls, or rather deep ends, of the Data Lakes and Big Data initiatives. First, I want to state that I think the Data Lake as a concept is a great idea, but it needs to co-exist with your more traditional data architecture.
So, what I’m discussing in this post is my take on the total data architecture as I see it in today’s data age.
We all know the data warehouse mantra “One single version of the truth” or the variance “One single source of the truth”. The single source statement will be even less relevant with the coming of Big Data. But the one single version of the truth is probably even more important now in the big data age. You need one place to implement your business rules, and place to contain the truth. What we see now is that the place for holding the truth is not always in the same place.
Let’s start by identifying the data source components of your data architecture.
“Traditional” or “small” data
This is your on-site systems, like your ERP system, your self-made operational side system or your manual master-data spreadsheets. If you start to look at your system portfolio you would probably find a lot of data that would fit into an enterprise data architecture. Here you will find most of your reporting basis, and most of your analytical basis. I would say that this your most valuable data, and hence this is where you should spend most of your resources preparing and manage. Whether the data resides on-site or on a server in the cloud, the extraction method is the same if you have a connection to the actual database.
From a data collection view I have mixed experiences with SaaS solutions. I have countless times run into trouble when trying to get data from the various SaaS providers. What you get when you buy or rent a SaaS solution is an application that serves its operational purpose, but you don’t get direct access to your own data. More and more providers let you access your data through API’s, which is a good solution, but in my experience, there are still a lot of smaller vendors that rely on flat file integration. This means that you must make a manual flat file integration at your side and store (possibly sensitive) data in open flat files on ftp servers. In some cases, you even must pay extra to get access to your data.
With the coming of new technology, we have new faster, cheaper and smarter ways to save more of the unstructured and semi structured data. Depending on your line of business one could argue that this is where your data “gold” is. Sensor data from various IOT systems, web logs you can store in detail and other fast data that you have the possibility to analyze as they come in. This is the area where traditional data warehouse architecture gets disrupted. It’s not given that you should model this data or implement it in your star schema. Also, this is where the ETL process must be adapted to whatever technology you choose to store your data in.
So, let’s talk about the Big Data perspective. Do we still need relational databases for modelling our data hub? Do we need to buy expensive in-memory databases when we can store and access data much cheaper and faster with for example HDFS? The answer to these questions are neither yes nor no. The answer solely depends on how your business is set today.
I have one thought regarding moving all your data storage to HDFS, the technology is still too inaccessible. The toolsets are yet not good enough, and there is a lack of professionals that know them. This will, of course, as with every new emerging technology, improve as time goes by.
With the data lake methodology, we have got a term that says something about how and where we should store the new “Big” data. CTO James Dixon at Pentaho coined the term as a contrast to the data mart to solve the problem of “information siloing”. As time has gone by, more and more vendors have adapted the term, and it has become an architectural necessity when planning your enterprise data architecture. The main thing we should remember about Data Lake is that Data Lake is a methodology and not a technology. Many people believe that Data Lake equals Hadoop, but this is not accurate.Data Lake is a methodology that can contain multiple technologies, and you should use the technology that fits your need, or even better, fits your data’s need.
This means you can have some data in Hadoop, some on Azure and some stored in a MongoDB. And, together, this is your Enterprise Data Lake.
I have seen presentations from companies that offload all their data into a data lake and store everything in Hadoop. Where they also set the business rules and make their reporting marts in Hadoop. So, it is an option to bypass the whole relational data warehouse, but as I said earlier the technology, as I see it, is still too immature that this is the smart move now.
At the Gartner Data and Analytics Summit in London earlier this year I attended a session where Edmond Mesrobian, CTO at Tesco and Stephen Brobst, CTO at Teradata talked about the data architecture at Tesco. I come from a fairly big Norwegian retailer, but what they talked about was almost science fiction to me. I think they had every technology you could think of in their architecture slide. They have made groundbreaking work at Tesco in regards of the new data architecture. One of the things that I noticed with interest was that they still have their EDWH intact in their architectural slide. They said that the EDWH will never disappear, but they will not make a new one.
So even at Tesco, that has the muscles and manpower to embrace every technology that they want, they still undestand the value of structuring the data in a data warehouse for reporting.
They also said that its necessary to enrich the EDWH with relevant data from the data lake, but not everything needs to be modelled into your star schema. Some of the data could just be offloaded in a raw format so the analysts had easier access to the data.
Being a Norwegian Data Warehouse Architect most of my projects have been around Norwegian customers. I often see that many of the companies that promote Big Data technologies and methods, compare Norwegian companies with large US companies. This often leads to, sadly, overselling technologies and methods fitted for much larger enterprises. In Norway, there are few, if any, companies that are as big as Tesco or Wall Mart or Bank of America.
So, if you are a small or mid-size company, you should choose an architecture that fits your needs. You must consider pricing models for the new technology. It’s not certain that you will get ROI on your cloud database. You must see if you can get qualified personnel to utilize the technology, and not rely on one single consultant. And, most important, you must look at the data you have or the data you are planning to get. Only after taking these steps can you start to choose your architecture and technology.
So how do I see the architecture for mid-size to big corporation in Norwegian standards? Well I’m really glad you asked…
The picture below describes in a helicopter perspective how I believe the best architecture for a “Norwegian-sized” company should look like.
Just a quick note, all the vendor and technology names are meant as examples!
This picture sums up most of what I have talked about in this post. This is how I see the co-existence with the traditional data warehouse and the new emerging technologies.
We often in these architecture slides forget to include the SaaS solutions and getting data from them. Make sure you have an easy way of reading from API’s and storing them in your data hub.
If your SaaS provider doesn’t support API’s, my strongest recommendation is to use a different provider. This is the best way to force the various providers to implement it.
When we talk about exposing data, think about your data warehouse as a data provider too, so you also need to expose your data through API’s. You must do this so your web pages or other systems have an easy way of getting the correct information.
There are of course also scenarios where you can expose data from your data lake. When to expose your data from the lake or your data hub depends on the nature of the data. If it is live stock count from stores and you have that data in your data lake you should of course expose it from the lake, but if the data are run in batch over night or even by the hour you probably should expose them from your data warehouse.
Regarding your traditional data from your ERP systems or other systems where you have direct ownership to the database, you will, as I have pointed out before be best utilized by an automation tool. I am not only saying this because I work for a company that is developing, probably, the best automation tool on the market. But I am also saying this because you need to free up time for your developers to get them up to speed on the emerging technologies. Time is one of our most valuable assets, so use that asset the best way you can.
As I am writing this our brilliant developers are busy implementing data collection to and from other new technologies, and not only your traditional data. But I will get back to the details regarding this in a little while.
In my previous posts, I have talked a lot about preparing your data and not forgetting your structured data in this time of Big Data, IOT and other cool stuff. Today I thought I’d take a step back and try to explain what our company do when it comes to preparing and structuring data and discuss how this can be a viable option for you.
So, what is Xpert BI?
We call it Self-Service data preparation or data warehouse automation. So, what does that mean and why should you do this instead of doing traditional ETL or ELT?
Our CTO and the brain behind Xpert BI, Erik Frafjord, had an idea that the process of getting to your data prepped and ready for reporting needed to be optimized, but not at the cost of quality, governance and documentation. In traditional data warehouse projects 80% of your time is spent preparing your data, and only 20% of the effort is spent on reports, analytics and distribution of the value your data add. You should try to switch that and make 20% go into the preparation and use your 80% on the decision-making process.
Using Xpert BI you always start with your metadata. The first step is to identify and understand your data sources. This being a flat file, or thousands of tables in your SAP installation. For a lot of the big ERP vendors like SAP or Axapta Xpert BI already have predefined application adapters where system specific metadata is combined with the platform specific metadata. This automatically enriches your data model with friendly-names and relationships to fully understand your system implementation.
Depending on the quality and availability of metadata, it is not unusual to include application owners or experts to enhance the source system data model. Doing this saves you a lot of time afterwards when you are going to use the data for further transformation processes or direct in analytics.
This understandable layer is what we call the idealization layer, here you have a “copy” of your source fully enriched with understandable data on the same level as your source. It is an ODS, but it is a lot more useful and understandable.
(Example of idealized data from SAP)
Business logic, denormalizations, integrations and transformations of data are done using MS Management Studio using T-SQL. The reason for this is that we didn’t want to reinvent something that works, and by using existing technology the need for training is minimal using Xpert BI. So, joining tables, and applying your business rules you do here, either with the help of a wizard or by writing your own sql code. Our tool manages all data loads by always maintaining table dependencies can easily configure commonly used load options such as incremental loads, filters, surrogate keys and snapshots and SCD of any given dimension attributes.
This means that you can use the methodology of your choosing designing your data warehouse or your DataMart. If you want a straightforward Kimball approach or want to design a Data Vault you are free to choose the architectural design that fits your organization best. Xpert BI can handle any number of databases when it comes to complete technical documentation and dependency control.
We also enrich your solutions with surrogate keys so your joining goes incredibly fast. You also get the complete lineage of your solution as your technical documentation. This lineage is also used for knowing in which order your tables needs to be loaded, so you never should worry about the sequence you are running your jobs. The tool will automatically optimize the parallelization and sequencing of data loads.
In a technical perspective, this sounds and works amazing, but it’s not only in the technical picture you will get a win. What you do here is to force your developers into a more governing way of
doing the development. You take away their possibility to get creative. Being a musician in my spare time it kind of hurts a little to say this, I have always encouraged the people working for me to be creative. But in this part of the process you need a governed process so anyone can read and continue to develop where someone left off. You shouldn’t be dependent on that one developer that made the original package to do the changes.
So, who would benefit from this?
The sales pitch would of course say everyone. Luckily I am not a salesperson. Let’s try to refine and identify when you should think about an automation tool like Xpert BI.
On a high level, there are three scenarios that would apply.
You are starting a new data warehouse initiative.
You are rebuilding or starting over your data warehouse initiative.
Your data warehouse initiative is not giving you the full picture or its just updating too slow.
If don’t have a data warehouse yet you should consider the automated way. This is probably the most obvious scenario where you should explore the automated opportunity.
The second scenario is much the same as the first, but here you usually have an advantage that some of your business rules are defined, and you only need to revise them. You also most likely must change your architecture because you have an architecture that has been disrupted, or you inherited the architecture from your predecessor.
The third scenario we could call the pain scenario. Here the organization is most likely experiencing that the reports doesn’t give them the answers they are looking for, the wrong answers or no answers at all. There is a lot of reason for a solution to get into this stage, but the main reason is the lack of governance in the development process. This scenario often leads to a rebuild or a start over, or at least it should.
So, having defined the three main scenarios where you should consider an automated approach, does that mean you should? Well you are getting closer to an “Of Course”, but before you rush into buying yet another tool that’s going to magically fix things, look at your organization, is this going to fit our organization?
Do your organization already have a development department staffed with BI developers, you could argue that with strong leadership and good governance they can develop and maintain your data warehouse without the use of a new tool like Xpert BI.
Big organizational changes that would lead to a change in tasks or downsizing, could lead to a disrupted department. Which again could and most likely will lead to a failed project.
I have saying that goes “A tool is only a tool, it’s the people that utilize the tool that make the change, never the tool itself”. So, promote the goal and the way to get there, before you squeeze the elephant into the buss. It might turn out to be a much smaller animal that needs the ride.
Hope I have clarified some of the things that Xpert BI can do for you, it’s not magic, but it is in many cases a lot smarter and a lot faster.
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.