Today, it is hard to dispute the value of data-driven approaches. As a result, businesses are seeking now, more than ever, for ways to harvest the value of data through analytics. But data and analytic methods have also been victims of the greatest villain of IT since its inception: the siloed approach.


It’s extremely important to understand that IT has been burned from the beginning of time by silos.  In order to quickly solve a business problem, IT has been dealing with silos created by mainframes, applications, storage, networking and more.  As a result, data has always been a victim of this approach since there was no strategy to pool and aggregate data across these silos.


There is now an opportunity for IT to show its greatest value to the business by architecting a data lake, transcending silos across data types, data stores, analytical methods, speed of analytics, and more.  IT can deliver an information landscape that expands the universe of data and the universe of analytical approaches to improve the accuracy of the analytical method, accelerating the speed at which value can be obtained, and enabling faster innovation in the data-driven world.



What is really exciting about a data lake is the ability to analyze a problem from a myriad of perspectives, increasing the quality of the outcome. If you consider hiring, for example, employers have been constrained to considering only a well-defined set of parameters before deciding which candidate is the best match. On the other hand, recruiting companies such as Gild, taps on big data entries from Tweet, Facebook, blogging feeds, question-answer web sites, such as Quora, and others, to analyze how a candidate interacts with the outside world and where his/her expertise lies, before deciding if the candidate is right for the environment. The retention rates and the employee satisfaction on the job can be far greater, yielding to a much better outcome than before.


A data lake sounds like the holy grail, but how can an organization actually build one when most IT environments are heterogeneous- a mix of data managed from traditional second platform technologies (file, block, etc) and new emerging third platform technologies (Hadoop, mobile, cloud)?  In other words, how can IT create this modern architecture that integrates and brings together diverse data sources with ease of management for IT and ease of access and analysis for data scientists, business analysts, and application developers.


Additionally, a data lake is not your traditional BI/DW environment - it requires new skills and processes, which organizations may struggle with.  Last, a data lake may not meet enterprise class requirements, such as mission-critical availability, performance, security, and data governance.