The managers of enterprise data warehouses (DWH) are carrying more and more responsibilities on their shoulders. As the processes of enterprises are more and more data dependent, the availability and data quality of data warehouses are becoming increasingly critical. A lot of the managers are more dependent upon their morning reports than on their morning coffees…
There is no question that in this situation DWH managers have to keep their feet on the ground as they have to ensure that the DWH also has solid bases.
On the other hand, a sea of data floods enterprises every day. Fortunately, technologies already exist to process this amount of data even within large corporations. While the main principle of DWHs built on relational databases is “orderliness”, it is possible to load everything without structure into NOSQL “Data Lakes”, for the fraction of the price of DWHs, or even with free of charge infrastructure.
It is therefore worthwhile for DWH managers to venture into these “watery” areas as new technologies may provide improvements in many aspects:
- Storage: Most of the data sitting on the expensive storages of DWHs are never even queried. Logs and archive data can be loaded into the Data Lakes. Furthermore it is possible to “get rid of” unstructured data (like text) that was forced into the relational databases. Additional possibilities include keeping the raw data in its original format so it can be accessed any time in the future even though it needs huge storage space.
- Computing capacity: complex SQL queries and data mining models would not consume the resources of data warehouses, thus critical reports would be ready in time and the self-service BI queries would be faster as well. Even those user requests could be served that remained ignored in the past because of the limits of DWHs.
- Real-time and unstructured data: there is an increasing demand by users for the use of these types of data. Without any solution this data will find its way into the data warehouses causing conflicts between colleagues and risking the base structure of the DWH.
The most can be gained from all this if it’s the data warehouse itself that opens up to this new world and designs the cooperation with NOSQL systems. It is better to be ready than to be surprised and to work under pressure with tight deadlines.
Today there are many challenges in the implementation of a RDBMS-NOSQL (earth and water) cooperation. For example the control of ETL and data loader processes, the integrated metadata management, and the supervision of access management. There are (even open-source) solutions to all the above-mentioned problems from both smaller and larger companies.
It is worth starting to think about how to connect water to earth. Water is definitely rising and it is essential to keep the foundations of DWHs dry.
Hiflylabs creates business value from data. The core of the team has been working together for 15 years, currently with more than 50 passionate employees.