I was asked to speak at Hadoop Summit 2015 by our good friends and partner Hortonworks. After getting over the initial excitement I realized that this called for something different from the typical presentation that I might deliver at a trade-show event. Hadoop Summit attracts the cream of the crop and the technical level of the audience meant that a typical, marketing-heat presentation would not suffice. Also as I thought more about the activity that I am seeing around the analytics and the adoption of Hadoop in particular, I hit on the notion that what I was really seeing was the true emergence of the DevOps that I had always imagined in my years in operations.
I spent 20 years in data center and IT operations roles before getting into the manufacturer world and it was during that time that I developed my view of the world of IT and technology: All the glitz, bells, whistles, and flash in the world doesn’t mean anything if you cannot make it run in production, day-in and day-out without requiring an army and advanced degrees and skills in very fringe technologies.
Unfortunately the early days of Hadoop and advanced analytics tools have been exactly that; and the barrier to entry I find most commonly in the IT world is that it is “just too complex” to get Hadoop up and running to make it worthwhile. This exposes the key second barrier to entry for analytics, which is not knowing WHAT value analytics delivers.
The Old World
When I started in IT, and indeed up through today, there were two distinct type of IT teams; Dev and Ops. This delicate and often out-of-balance relationship of builders and runners led to many interesting conflicts that weren’t always restricted to the data center or conference rooms (I remember a particularly aggressive flag football game at a company picnic in particular…). Generally this relationship took the form of Developers building things in a total vacuum and tossing it to Operations to run in production with little or no communication and validation of the ability of it actually running in the first place.
This led to a lot of difficult and often unwieldy systems being built to support increasingly complex and distributed systems. A lot of tribal knowledge and unsupportable environments were created for a single purpose. Fortunately technologies emerged to lighten this load on the Ops side of the equation in the form of shared compute and storage resources through innovations like SAN and virtualization. Developers could finally take advantage of technologies in order to create more accurate mirrors of the production environment for the purposes of testing and developing their applications.
The New World
The world of Hadoop and advanced analytic tools looks a lot like the world before the emergence of these advanced operational technologies. Massive environments of dedicated, single-function servers and hardware looks a lot like 1993 all over again and that is something that operations has been trying for a long time to get away from.
This accepted model of Hadoop deployments has its roots in a simple fact of IT skunk-works type projects; it was built to solve a specific challenge, using what was available at the time. So given the challenges of the need to move and analyze a lot of data quickly, and the time that this was being built - pre-2010 - the biggest challenge was getting data to a CPU as fast as possible. Since SANs and Networking were not nearly as evolved; and the adoption of things like 10gig networking wasn’t in place, it made sense to put the data local to the CPU for the best speed. This has resulted in the expected outcome of all Hadoop being built the same way, regardless of its operational capabilities or lack thereof…
The Killer App for DevOps
Operations fixes broken stuff and that model is very inefficient; so they are aggressively trying to innovate. As Hadoop stops being the realm of science fair project and becomes the de facto platform for the creation and manipulation of analytics models; it becomes increasingly important that it mature in terms of its ability to perform in the enterprise. If you can’t run it inside the expected platforms that IT has already put in place, then the majority of organizations will simply take a pass. This need for operationalization of Hadoop and its ecosystem is the killer app that DevOps has been waiting for to emerge and mature fully inside any organization, not just a massive enterprise or tech-startup.
DevOps is the marriage of Developers and Operations into a single hive-mind organization that strives for stability and security while providing an agile and flexible platform for the delivery of new technologies to the business and to consumers.
In the analytics world this takes the form of delivering on the concept of Data as a Service and the ability to provide data and analytics capabilities to the data consumer directly. In a broader sense, it is the use-case that if companies can get right, they can truly have a chance to make a market-changing move into new areas of their own business.
Where do we go from here?
As custodians of Hadoop and HDFS we are entrusted to build a platform that we can embrace and extend into all the areas that we need to leverage this type of capability for large data-set processing and storage. The call to action for all of us is to NOT lose sight of the need to make sure this technology isn’t relegated to the scrap heap that so much of the new, hot tech ends up on.
Hadoop has staying power and will continue to be a platform that companies use to deliver actionable insights into data for a long-time to come, but if it is incapable of living in the DevOps frame-of-mind and set-of-tools then it will be a pyrrhic victory at best. In the act of being successful in the accepted model, it will eliminate itself from competition in the future as organizations move past it to something with the ability to run in the DevOps model.
If you missed this presentation watch the on-demand version of it here.