Express Computer
Home  »  Big Data / Analytics  »  Big Data Analytics: Modernize data warehouses for better results

Big Data Analytics: Modernize data warehouses for better results

0 409

Analytics in its complete definition consists of four main types. Descriptive, Diagnostic, Predictive and Prescriptive. First two have always been delivered via Data warehouses and BI systems.

By Amit Sharma

Analytics in its complete definition consists of four main types. Descriptive, Diagnostic, Predictive and Prescriptive. First two have always been delivered via Data warehouses and BI systems. With maturing Big Data capabilities, the feasible business cases for later two types are now well established and industrialized. They are popularly called Advanced Analytics.

It’s also established that there need not be an underlying Data warehouse to deliver Advanced Analytics, but then it would be a waste of opportunity in not utilizing the investments and harness value from data in Data warehouses. The maturity of capability of Big Data family offers the best solution for Data warehouses to scale up to be leveraged for advanced analytics quickly and economically. Hybrid cloud models have made it even more lucrative.

Big Data has also substantially changed the way Statistical Analytical models were built. Traditionally, these were created with only representational data samples in first iteration, followed by multiple time consuming recalibration cycles. With Big Data, now it’s feasible to create even the first pass of model using much larger, and at times complete data sets. Which not only allows for introduction of additional predictor variables quite early, but also increases forecast accuracy many folds, revealing many trends which otherwise were not discovered with just representational data.

Making a data warehouse scale up to deliver advanced Analytics may need transformations. Advanced Analytics is done from combination of data sets like voices, images, sensor feeds, Geospatial etc. Majority of the Data warehouses can process only structured data sets stored in standard Relational Databases. Data warehouse infrastructure must open up to acquire, combine and co-process data of varying formats, e.g., files, server logs, databases, images, sensors and many more.

This diverse data will come at variety of speeds. Daily batches, to real-time streams (e.g. sensor data from healthcare devices). Making provisions for data acquisition at various speeds is essential. Due consideration must be given to Restart ability options in eventuality of data transmission failure.

Some banks have found that their ability to better predict a possible loan default increases by 15% upon inclusion of gist of interactions in branches for other loans. It further improves few folds by adding data from reminders sent over past 3 years.

However, for one discreet entity of a business domain (e.g., a Retail Customer), Data warehouses have always desired to source data from only one IT system. And whenever not possible, then reluctantly from minimal, necessary & sufficient number of IT systems.

Generally, this happens due to different systems built for different lifecycle stages (customer acquisition, servicing etc), Or systems built to cater to specific functionality (Employee in Payroll & Training systems), Or systems built by departments in silos.

Be it due to technology, people or process reasons, this has been one major crib amongst sponsors of Data warehouse. There are enough cases when this is attributed to lack of Metadata, huge number of data layers, slowing governance processes, difficulty in getting so many stakeholders together etc.

For rendering Advanced analytics, these become major showstopper. Therefore, carefully introspect the reason for this in your case and build enough flexibility in faster acquisition & processing of more data sources.

On agenda for performance and resource optimization, Data warehouses eventually drift towards keeping only minimal data history online (Cold data offloads). This directly contradicts the readiness for advanced analytics. For e.g., in Stock markets, over a business cycle of 4 years, the primary, intermediate, and short-term trends last upto 2 years, 9 months, and 6 weeks respectively, which cannot be discovered with just a few year’s data. Thus, Data warehouses must look at ways of economically making entire detailed historical data available for Analytical discoveries

Data models for many large scale Data warehouses are designed on 3rd or higher Normal form. However, Advanced Analytics needs its data together. Therefore conscious movement down the ladder of denormalization is essential. Another consideration is to upgrade Data Hubs & Stores to Data lakes.

Amit Sharma is Principal Technology Architect, Infosys.

Get real time updates directly on you device, subscribe now.

Leave A Reply

Your email address will not be published.

LIVE Webinar

Digitize your HR practice with extensions to success factors

Join us for a virtual meeting on how organizations can use these extensions to not just provide a better experience to its’ employees, but also to significantly improve the efficiency of the HR processes
REGISTER NOW 
India's Leading e-Governance Summit is here!!! Attend and Know more.
Register Now!
close-image
Attend Webinar & Enhance Your Organisation's Digital Experience.
Register Now
close-image
Enable A Truly Seamless & Secure Workplace.
Register Now
close-image
Attend Inida's Largest BFSI Technology Conclave!
Register Now
close-image
Know how to protect your company in digital era.
Register Now
close-image
Protect Your Critical Assets From Well-Organized Hackers
Register Now
close-image
Find Solutions to Maintain Productivity
Register Now
close-image
Live Webinar : Improve customer experience with Voice Bots
Register Now
close-image
Live Event: Technology Day- Kerala, E- Governance Champions Awards
Register Now
close-image
Virtual Conference : Learn to Automate complex Business Processes
Register Now
close-image