Find Communities by: Category | Product

Converged Platforms

10 Posts authored by: Phil Hummel

During my 10 year tenure at Microsoft, I met with many corporate and commercial software development teams for architecture review sessions.  The most likely recommendation to result from those sessions was "instrument your code".  Ongoing development, testing and support are much easier if you have a detailed historical record of what has been happening with your product.  Most communication and computer equipment as well as commercial software products that are purchased today produce these detailed records of important activity and events.  This machine generated data is the fastest growing segment of what we call the "big data" market.


If you have never been involved in hardware/software development or IT support, this may all sound a little abstract.  If you have access to a Microsoft Windows computer, go to the search bar and type event.  The first suggestion should be a program called event viewer, double click that icon to start the program.


event viewer.PNG.png

Welcome to the world of machine data!  Two things I want to highlight are:

  1. There is an incredibly large number of activity and event types that are collected, and
  2. It is impossible, even for an expert, to tell if this machine is "healthy" or not from this display.


This is "raw" event data presented in lists.  While the major operating system vendors like Microsoft make it easy for hardware and software developers to write events into a central logging framework using a simple application programming interface (API), the result of all this effort is a giant bucket of bits.  Someone then has to write software to analyze and make sense of the raw event data to derive insights.


And all this raw data you're seeing is coming from just one Windows computer.  Every piece of networking, server, storage and specialty hardware gear in and out of a corporate data center has an activity and event logging capability just as complex, or more so, as the Windows OS event system viewer shown here.  And, there are no standards or even conventions for how to construct or store activity and event data employed across multiple products.  Every vendor and every product will have a unique format for machine data.


Now you can start to get a feel for the formidable complexity that confronts the operations staff of a corporate data center.  If someone asked me how I would architect a software analysis tool that could handle this level of complexity, first, I would suggest that they design a source independent representation of an activity and event that could represent the entire universe of data sources that I was going to encounter.  Then second, I would start writing source specific pre-processors that would translate the raw data from each source into my internal and universal data representation.

splunk-the-big-data-engine.jpgHowever, if you haven't tackled this problem yet, or aren't happy with the solution you have don't break out a compiler and start writing code.  You should really check out our partner, Splunk Software, ranked #1 in Worldwide IT Operations Analytics Software market share.  They have already implement  this approach and much more for handling the complexity of machine data with their Splunk Enterprise product.

Splunk Enterprise can index any kind of streaming, machine, and historical data, such as Windows event logs, web server logs, live application logs, network feeds, system metrics, change monitoring, message queues, archive files, and more.  Splunk Enterprise transforms incoming data into events, which it stores in indexes. The index is the repository for Splunk Enterprise data that facilitates flexible searching and fast data retrieval.  Splunk Enterprise handles everything with flat files using an application native format that doesn't require any third-party database software products. This architecture gives Splunk a great foundation for controlling scale and performance.

Another aspect of Splunk Enterprise architecture that fits with best practices for handling data complexity is the application (apps) and add-ons framework.   Apps and add-ons are both packaged sets of configuration that you install on your Splunk Enterprise instance that make it easier to integrate with, or ingest data from, other technologies or vendors.  Although you don't need apps or add-ons to index data with Splunk Enterprise,  apps and add-ons can enhance and extend the Splunk platform with ready-to-use functions ranging from optimized data collection to monitoring security, IT management and more.

Dell EMC and Slunk work closely to provide a total solution with Splunk Enterprise and Dell EMC hyper-converged platforms tailored to address the complexity of machine data analytics.  Our Ready Systems for Splunk provide non-disruptive scalability and performance, optimized for Splunk workloads  Dell EMC Ready Systems for Splunk are purpose-built for the needs of Splunk, helping consolidate, simplify and protect machine data. These Ready Solutions include the hardware, software, resources and services needed to quickly deploy and manage Splunk in your business.  Check out these resources for more details

rs splunk.png

Ready Systems for Splunk Solution Overview

Using Splunk Enterprise with VxRail Appliances and Isilon for Analysis of Machine Data


Splunk Enterprise on VxRack FLEX for Machine Data Analyics

There are a bunch more features of the Splunk Enterprise platform that I want to write about including the use of multiple index locations for data aging and scale and how the main services are implemented as individually install-able and configurable components but that is going to have to be another article - coming soon.

Thanks for reading,

Phil Hummel



Solution Summary

VCE, the Converged Platform Division (CPD) of EMC just released a paper titled VCE Solutions for Enterprise Mixed Workload on Vblock System 540.  In this solution guide we show how the Vblock Converged infrastructure (CI) platform using all-flash XtremIO storage  provides a revolutionary new platform for modernizing deployment and management of mixed-workload and mixed-application environments. The Converged Platform Division (CPD) together with the Global Solutions Organization brought together a team with expertise in both deploying  Vblock systems and deep Oracle, Microsoft, and SAP workload knowledge.  The goal of the team was to build, test, and document a near-real life mixed application solution using a Vblock 540 system powered by XtremIO all-flash storage.


The business application landscape for the testing environment consisted of:

                • A high frequency online transaction processing (OLTP) Oracle application
                • A simulatied stock trading OLTP application for SQL Server
                • SAP ERP with an Oracle data store simulating a sell-from-stock application
                • An Oracle decision support system (DSS) workload
                • An online analytical processing (OLAP) workload accessing two SQL Server analysis and reporting databases
                • Ten development/test database copies for each of the Oracle and SQL Server OLTP and five development/test copies of the SAP/Oracle system.


The combined test results when Oracle, Microsoft and SAP mixed workloads were run simultaneously produced demand on the XtremIO  array of ~230k predominately 8KB IOPS together with an average throughput of 3.8 GB/s (primary I/O size 64 KB and 128 KB), with an 88 percent read and 12 percent write ratio. Average response times were recorded to be 866 μs, 829 μs for reads and 1152 μs for writes.

mixed workload results.png

IT decision makers who are evaluating new options for data center management to help provide better service with lower TCO should research VCE CI platforms that use all-flash technology. We invite you to read the full report to understand our methodology and results or contact your local EMC representatives to discuss if converged platforms are the right choice for your next data center modernization project.


Thanks for reading,

Phil Hummel  @GotDisk

Phil Hummel

IoT meet Hyper-converged

Posted by Phil Hummel Aug 28, 2017

blog iot.pngDay 2 at VMworld.  Vegas is maybe the worst place I can think of to focus and get work done but I'm off to a good start. I actually got up this morning in time to hit the fitness center, do a conference call, empty my inbox, and get food before heading to the show.  I then spent the rest of the morning in the HCI Zone talking to VMworld attendees about our vSAN Ready Nodes and VxRail hyper-converged infrastructure (HCI).


I found myself pulling up technical specs on our Virtual Rack kiosk more than I wanted too but I don't get out to as many events as I used to since moving from pre-sales to marketing a couple of years ago.  Now when someone asks me what fluid is used in the liquid cooling with our dense design servers I just say, "i"m in marketing".  Oh, can you get somebody technical?  "Sure".  I'm always amazed by how many people come by the both just to say "our account team is awesome, we love your stuff, no questions - just wanted to say hi".  Energy boosting for sure.  I was also energized by meeting new coworkers and getting to interact with them and customers.  The pace and quirkiness of trade show social engagement creates some really fun moments.


With all that was going on the  first session I got to was at 3:30PM.  The session was titled  Enabling the Edge with the Fundamentals of a Hyper-converged IoT Infrastructure given by Greg Bollella, CTO, IoT, VMware.  It was both motivational and informative.  Greg talked about why companies should deploy HCI for their IoT deployments that physically live at the "edge,"  close to where the devices that generate data and take control messages.  Greg also presented some great data on the risk of sending all your real-time data to the cloud and the importance of the hybrid approach when it comes to IoT.


A second theme of the talk was about reducing time to deployment.  Currently, IoT implementations require a lot of components, typically from a bunch of suppliers, and therefore take a lot of time to assemble, deploy, integrate, configure and test. Imagine a hyper-converged infrastructure that enables you to speed up deployment, reduce latency and risk, and implement IoT faster and easier.  Perhaps not surprisingly he mentioned VxRail as a great platform for HCI at the edge.  I want to add a plug for vSAN Ready Nodes as a second option for a compact and powerful HCI appliance well suited for placement outside the traditional data center near the edge.  Use this link to read more about  the latest on hyper-converged innovation from Dell EMC.


I've been following the discussion of distributed computing for IoT, also known as  fog computing, for a couple of years.  I starting volunteering with the OpenFog Consortium last year and have been working to help organize the upcoming Fog World Congress this October in Santa Clara CA.  This was by far the best presentation I have seen at a non-fog related conference on the need for edge computing for IoT.  It was also the first time I have seen anyone explicitly identify the advantages of hyper-converged infrastructure for use at the edge for IoT deployments.  Very exciting to see the IoT discussion getting real.


This year VMworld has a set of sessions dedicated to IoT and a large presence in the exhibit hall dedicated to IoT solutions and partners.  The potential to change the world with smart devices and new models for distributed computing is here and its large.


Thanks for reading,

Phil Hummel


OI.pngOperational Intelligence

There is little debate these days over whether business decisions should be informed with insights driven by data analytics.  Try to find one article at , or the Harvard Business Review encouraging business leaders to eschew all the promise of machine learning, artificial intelligence and big data in favor of making decisions from the gut. However the journey from wanting to be more data driven to realizing success is still a big challenge for many organization.  As many as 3 in 4 big data and analytics projects produce little or no tangible benefits.  Success requires the ability to execute on several levels simultaneously. In the diagram to the right I have highlighted the need to bring together the right people, the right data and the right analysis approach in a way that recognizes the strengths and limitations of each component .  In the table below I discuss how each of the three components can be brought together to produce a successful operation intelligence project outcome that leads to better decision making.


ChallengesWhy Splunk on Dell EMC
PeopleEvery business and organization has unique characteristics that are known most intimately by the people that work there. That is a key competitive advantage.  For most organizations the time your people spend designing, testing and maintaining IT infrastructure is not going to help build value.  The same can be said about developing data analytics capability.  Every person hour spent wiring hardware and software together to enable your analytics capability is not being spent improving data quality, reviewing results or changing how the business operates.Dell EMC hyper-converged infrastructure running Splunk Enterprise analytics maximizes the time your people spend understanding data, interpreting results and implementing money making changes for the business.  Dell EMC is the #1 market leader in easy to procure, easy to implement and easy to maintain data center infrastructure. Splunk Enterprise is a complete end-to-end analytics solution that handles all aspects of processing data, from input through indexing to search and reporting.  Let your people focus on their strengths by leveraging our strengths.
DataThe three V's of data, 1) velocity, 2) volume and 3) variety, are driving organizations to get more disciplined in their approach to implementing analytic based decision making.  Research shows that data identification, cleaning and merging typically consumes 80% of the time and money spent on data analytic projects.  A lot of that cost and complexity are related to the use of different platforms and analytic techniques for each silo of data.  Organizations that can consolidate data processing for many silos of data onto a single integrated framework drive down the cost of analytics projects and achieve results more expeditiously.  One of the most cost effective ways to deal with the 3 Vs is to standardize the tools and platforms to the greatest extent possible.

Dell EMC hyper-converged platforms deal with the velocity and volume of modern data analytics by allowing fine grained expansion of both compute and storage resources.  No more forklift upgrades or stranded assets when the need arises for increased analytic sophistication with larger volumes of data.

Splunk will turn silos of data into integrated operational insights and provide end-to-end visibility across your IT infrastructure producing faster problem solving and informed, data-driven decisions.  Splunk Enterprise works by first ingesting data from files, the network, or other sources. The data then gets parsed and processed into searchable indexes.  Finally, Splunk allows users to run searches on the indexed data regardless of source.

AnalysisWhen a data analytics project fails to deliver on the initial expectations,  one common reaction is to purchase new analysis tools.  Over time the organization accumulates 6 or more isolated "teams" organized around various tools and business problems.  In my experience this never produces good results and creates an organizational dynamic that is difficult to fix. Another typically ineffective strategy that I see organizations adopt is splitting analytic resources between 1) big data teams with lots of resources and long development times and 2) groups of "power users" with tools like Excel and other desktop scale applications that can generate small wins quickly but lack the ability to scale those solutions to solve big data problems.  Finding tools that scale across broad ranges of data sizes and problem domains is an effective way to minimize these challenges.Splunk Enterprise handles small and large analytic jobs with one framework and one set of tools.  One instance of Splunk Enterprise running on a single server can handle all aspects of processing small to medium data flows, from input through indexing to search. To support larger environments where data originates from many sources and where many users need to execute more sophisticated searches, you can scale your deployment by distributing Splunk Enterprise functions across multiple machines. Dell EMC hyper-converged infrastructure can also start as small as one machine and scale both computing resources and storage to match the design of your Splunk distributed environment including highly available configurations.  One tool set and one platform for any size of data analytics means the whole team is working together.


Jointly Validated Hyper-Converged Solutions for Splunk Enterprise

Hyper-converged infrastructure integrates IT components in a scalable rack or appliance allowing you to modernize your data center with simplified management, improved performance, and elastic scalability.  Dell EMC hyper-converged systems simplify all aspects of IT by seamlessly integrating compute, network, storage, and virtualization technologies into one system.  Dell EMC and Splunk have jointly validated two hyper-converged system options for Splunk Enterprise solutions - VxRail and VxRack FLEX.

VxRail for Splunk Enterprise

VxRail Appliances allow you to build your software defined data-center with the only fully integrated, pre-configured, and pre-tested VMware hyper-converged infrastructure appliance on the market.

“VxRail is the simplest, most powerful, most integrated HCI appliance for customers standardized on VMware” - Chad Sakac , The Virtual Geek Blog


Download the Solution Guide  Using Splunk Enterprise with VxRail Appliances and Isilon for Analysis of Machine Data


VxRack FLEX for Splunk Enterprise

VxRack FLEX powers your data center with rack-scale engineered systems that achieve the scalability and management requirements for traditional and cloud-native workloads.

“If you want an insanely scalable and flexible HCI Rack-Scale system, with hypervisor of choice or bare metal… one that can start small and scale out - VxRack FLEX is for you.” - Chad Sakac , The Virtual Geek Blog

Download the Solution Guide  Splunk Enterprise on VxRack FLEX for Machine Data Analytics


Both solutions are the result of hundreds of engineering hours.  The Dell EMC engineering team along with Splunk worked together designing specific configurations based on a variety of different deployment scenarios and rigorously tested them to ensure performance. The solutions guides (links above) give you the configurations tested along with implementation guidelines and deployment best practices.  Starting your journey toward more data driven decision making is shorter with less inherent risk when you start with Dell EMC/Splunk jointly validated solutions. 

Thanks for reading,

Phil Hummel @GotDisk


Share via emailShare on Linked InShare on Twitter

hadoop_sketch.pngWe just passed the five year anniversary of the initial release of Apache Hadoop and I think it fair to say it a ubiquitous technology.  Born out of Google's need to find a way to store and query a repository of every URL in the world, plus all the relationships between them, Hadoop is now synonymous with enterprise big data.  What is fascinating to me is that a technology designed for the largest of the large web scale companies is now providing critical value to so many organizations.  We all interact everyday with companies that use Hadoop but don't think of them in same way we do Google or Yahoo.  The companies we rely are making and selling the everyday goods and services that we want, but yes, they rely on Hadoop primarily to keep us coming back. Hadoop on the other hand is not an everyday product.  Hadoop is better describes as an ecosystem made up of a collection of open source software projects, tools, and people.  Since Hadoop is open source with no single point of control, the Hadoop community is critical to the ongoing success of the ecosystem. 


hunk isilon.pngThe core of what makes Hadoop so powerful is the Hadoop Distributed File System (HDFS) and the MapReduce engine for scheduling and tracking jobs and tasks. An example of how 3rd parties enhance the Hadoop ecosystem is the Dell EMC Isilon enterprise storage platform. Hadoop compute clients can access data stored in an Isilon cluster to perform massive scale analytics by connecting via HDFS.  Isilon's multi-access protocol support means you can also connect to the cluster with your existing workflows and standard protocols, including SMB, HTTP, FTP, REST, and NFS as well as HDFS.


On the software side of the Hadoop ecosystem is an incredibly rich set of tools that sit on top or along side Hadoop.  You may have heard of many of the most popular access tools including  Apache Pig, Hive, and  HBase that are also part of the open source Hadoop ecosystem.  What may be more surprising is the integration of commercial software with Hadoop.


For example, take Splunk Enterprise, the leading platform for Operational Intelligence that enables the curious to look closely at what others ignore—machine data.  Since machine data management is one of the fastest growing needs for our customers moving into the world of big data, Dell EMC is developing solutions for every stage of the journey from first proof-of-technology to rack scale massive implementations. Through our partnership with Splunk Software, we recently published the results of a joint Splunk and Dell EMC Solution Guide.  It contains both a high level "why Splunk and Dell EMC" section as well as details about the engineering lab work that we did to show how easy and powerful this solution is.


splunk hadoop decn.png

Splunk Analytics for Hadoop is a full-featured, integrated analytics platform that enables you to interactively explore, analyze, and visualize data in Hadoop without building fixed schemas.

It lets you rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop without having to move or replicate data.


Splunk Analytics for Hadoop can be used to:

                • Interactively query raw data by previewing results and refining searches using the same Splunk Enterprise interface.
                • Quickly create and share charts, graphs, and dashboards.
                • Ensure security with role-based access control and HDFS pass-through authentication.


New paradigms, such as the Hadoop framework, have become extremely popular because of their capability to ingest and store massive amounts of raw data.  This guide is intended for infrastructure administrators, storage administrators, virtualization administrators, system administrators, IT managers, and others who evaluate, acquire, manage, maintain, or operate Splunk Enterprise environments.  In the guide we describe how to integrate Splunk Analytics for Hadoop with an existing data lake implemented using Isilon support for native Hadoop Distributed File System (HDFS) enterprise-ready Hadoop storage. Please take some time and review this solution guide and use the comment area below to tell us what you think or ask any questions.




VCE Vblock 540 Infrastructure For Splunk Enterprise Solution Guide



Thanks for reading

Phil Hummel, EMCDSA

On Twitter @GotDisk


Social Sharing

The best way that I have found to describe "Big Data" is data sets that are so large or complex that traditional data processing applications and experience are inadequate.  What I find particulary relevant is that it recognizes 1) we are dealing with a moving target and 2) what is big for one organization can be routine for another based on experience. Take Facebook for example.  In 2012, their VP of Engineering, Jay Parikh, said in a TechCrunch interview  “we think we operate the single largest Hadoop system in the world.”  at around 100PB.  That was big in 2012 but in just 4 years that number doesn't seem so out of the ordinary.   


For comparison here are some facts that show what ScaleIO customers are currently doing in their data centers

                  • 20PB today going to 100PB in 2017
                  • 4PB growing 100% year over year

                  • 100PB of Oracle alone

                  • 7PB growing 50% each year

                  • 10PB in our ESX environment


Of course Facebook has moved on to even bigger volumes with estimated injestion rates of 500+ TB per day, but what was once the territory only known by the largest of large web scale companies is now well within both the needs and capability of a growing number of more traditional manufacturing, banking, health care and other businesses that we encounter every day.


cloud data center blog2.pngIn order for enterprise IT teams to achieve their goals they have quickly adopted the best practices of web-scale companies like Facebook, Amazon, Yahoo, and others to operate data centers with ruthless efficiency.  For example, the benefits of server virtualization are well understood in the modern enterprise data center. By abstracting, pooling, and automating compute resources, companies have achieved massive savings.


When it comes to managing storage however, the emulation of web scale practices in the enterprise is much less prevalent.  We know that most of the large internet scale companies do not buy commercial storage array products for their core storage infrastructure - they build their own services using commodity hardware and applying the principles of abstraction, pooling, and automation.   ScaleIO Software-Defined Storage applies these same principles to local storage in standard x86 servers to create a high performance SAN entirely with software achieving the same operational and TCO benefits that drive web scale efficiency.


Where to start

Our mission is to help bring this level of storage scale and operational efficiency to every organization that is moving toward a full implementation of web scale best practices to their data center operations.  Our interactions with customers have shown that many customers are looking for a purpose-built solution to support their mission-critical application workloads delivering scale, performance, elasticity and resiliency. They want a solution that offers flexibility of choice when it comes to deployment models, configurations and broad support for operating systems and hypervisors. This solution also should be pre-configured, qualified and supported by a single vendor.


The All-Flash Dell EMC ScaleIO Ready Node is that solution.  It combines Dell EMC ScaleIO software-defined storage and Dell EMC All-Flash,enabled PowerEdge servers to bring web scale to any data center. Customers can rapidly deploy a fully architected block storage server SAN that is a pre-validated, configured and fully supported solution.

ScaleIO Ready Node 1U1N (clear).png


If you need more flexibility in leveraging new or existing x86 server resources, you can also leverage ScaleIO software to transform them into an intelligent Server SAN.  ScaleIO combines HDDs, SSDs and PCIe flash cards to create a virtual pool of storage with multiple performance tiers that is hardware agnostic.  It can be installed on both physical and virtual application servers.  As storage and compute resources change, ScaleIO software will automatically re-balance the storage distribution to optimize performance and capacity usage.


Historically, having server-based flash led to poor resource utilization because performance and capacity were only supporting local applications. Today, with the power of ScaleIO software defined storage, the ability to abstract, pool, and automate storage devices across a multitude of servers, and in turn allocate as little or as much performance and capacity as needed to individual applications, is just as easy as allocating compute and memory resources in a virtualized environment.  Please use these resources to get more details.


More resources


ScaleIO Enablement CenterHow Software-Defined Storage Reduces Total Cost of Ownership
Dell EMC PulseProduct & Technology BlogEMC SDS Customer Success Story: Blue Central Video
Social Media
Dell EMC ScaleIO Ready Node: Transform Your Data Center
Follow @DellEMCScaleIO on TwitterLight Hearted
DownloadsThis Is What Scale Feels Like: Software Defined Storage
EMC ScaleIO Free DownloadThis Is What Choice Feels Like: Software Defined Storage


Thanks for reading,

Phil Hummel, EMCDSA

On Twitter @GotDisk


Please share with your social networks


Computers_global_internet_network_data_processing_s_198188897_720x396_72_RGB.jpgI spent eight really great years working with .NET developers at the Microsoft Technology Center in Mountain View, CA.  There was one drum that I beat regularly during those design meetings and lab testing engagements - "why don't you instrument your code?"  Having good telemetry really makes it so much easier to baseline your environment and do trouble shooting. I would always point out the rich sets of application and hardware specific performance counters that Microsoft and other vendors would implemented with every piece of equipment, OS version, and application that they shipped.  What I found out was that my customers weren't doing anything with all the rich sources of telemetry they already had available - until the data center was smoking rubble.  Then a bunch of folks would scramble around trying to do forensics on what logs were still available.  The real problem was log data collection and processing, not adding more sources of telemetry.


splunk decn.pngWhatever you call it - digital exhaust, log files, time-series, big data.- machine data is one of the most underused and undervalued assets of any organization. But some of the most important insights that you can gain—across IT and the business—are hidden in this data: where things went wrong, how to optimize the customer experience, the fingerprints of fraud.

All of these insights can be found in the machine data that’s generated by the normal operations of your organization.  The team at Splunk develop software that make this largely untapped wealth of machine data accessible, usable and valuable to everyone interested in enabling digital transformation.


VCE Vblock 540 For Splunk Enterprise

Dell EMC and Splunk recently published a solution guide that describes a VCE Vblock® 540 converged infrastructure solution that shows how the flexible scaling options of converged platforms and tight integration with Splunk Enterprise for analyzing large quantities of machine data.  The engineers at Splunk put a lot of effort into designing a data storage system that can utilize different classes of storage.  This helps utilize the highest performance (and cost) storage for recent data that is more highly utilized and less expensive (lower perfomance) storage for older "cold" data.  This architecture pairs beautifully with the rich portfolio of converged platforms and storage options available from Dell EMC.  In this solution we used an XtremIO all flash product for the recent hot/warm data and the massive scale and attractive $/TB of Isilon for cold data.

The solution guide shows how VCE™ converged infrastructure offers scalable solutions that meet and exceed the performance and capacity requirements for a high-performance Splunk deployment. Our sales teams can help you configure a solution according to the sizing guidance provided by Splunk and following the documented best practices for Splunk Enterprise, XtremIO, Isilon, and VMware.


Dell EMC converged solutions enable customers to simplify purchasing and deployment, reduce their hardware management overhead, and accelerate their time to value for all your enterprise workloads. VCE offers a wide portfolio of options to match the performance and pricing needs of large and small Splunk Enterprise deployments. If you are interested in mixed workload consolidation, you might also consider using VCE Vblock systems for multiple application workloads, including Splunk.


The primary benefits of running Splunk Enterprise on VCE converged infrastructure


  • Splunk-validated configurations
    • Jointly validated by Dell EMC and Splunk
  • Optimal storage tier alignment for Splunk Enterprise
    • Flexible sizing options to achieve the desired retention and performance profile for indexing and searching data in Splunk Enterprise
  • Cost-effective and flexible scale-out
    • Scale-out compute and storage management in a single converged platform package
  • Powerful data services
    • Out-of-the-box secure data encryption and data reduction services, along with integrated copy data management for efficient backup and restore capabilities


Next Steps

Machine-generated data is one of the fastest growing and complex areas of big data. It's also one of the most valuable, containing a definitive record of all user transactions, customer behavior, machine behavior, security threats, fraudulent activity and more. Splunk paired with the Vblock 540 turns machine data into valuable Operational Intelligence no matter what business you're in.  Operational Intelligence gives you a real-time understanding of what’s happening across your IT systems and technology infrastructure so you can make informed decisions.

the Vblock System 540 combined with the VCE Technology Extension for EMC Isilon storage is an excellent scale-out solution for a high performance Splunk Enterprise deployment. In this solution guide we show why the Vblock 540 provides predictable performance, low latency, flexibility, and room for future growth for mission-critical Splunk Enterprise big data applications.


You can download a copy of the solution guide discussed here by using the hyperlink below:


VCE Vblock 540 Infrastructure For Splunk Enterprise Solution Guide



Thanks for reading,

Phil Hummel, EMCDSA

Look for me on Twitter  @GotDisk


Please share this article with your social networks




In the spring I wrote an article for this blog summarizing the results of a project that demonstrated the use of a Vblock® 540 system with All-Flash XtremIO storage for running mixed application workloads (Oracle, SAP and Microsoft SQL Server).   The test workload included a near real life mixture of transaction processing and decision support across several of the worlds most demanding enterprise applications running on a single converged infrastructure platform.  In order to further show the power and scale of the Dell EMC VCE portfolio, the same Global Solutions Engineering team recently completed another mixed application workload project this time using a Vblock® 740 system.  The Vblock® System 740 offers massive consolidation capability with up to 4 PB of raw storage capacity, Cisco UCS blade servers, and VMware vSphere virtualization. In this mixed application and workload demonstration, the engineering team highlighted the capabilities of a Vblock 740 configured with a VMAX3 All-Flash 450FX storage array.

As Todd Pavone, COO, Converged Platforms and Solutions Division, Dell EMC wrote in a recent blog post titled: Realizing the Full Potential of Your Converged and Hyper-Converged Solutions and Platforms,

"having a collaborative converged infrastructure team that understands the whole stack, not just the silos of compute, network and storage, provides an organization with the tool sets to successfully support the business".

The new  VCE Solution for Enterprise Mixed Workload on Vblock System 740 solutions guide provide a holistic view of what it takes for an IT team to configure and operate a converged infrastructure platform that will meet the exacting demands of enterprise environments.  The guide can be used to share knowledge and start dialog with the diverse talent spread across your organization to start building the critical multi-functional team that it takes to make IT more cost effective and responsive to the needs of the business.


The Value of Converged Platforms


Forester Research recently released a report that found the move to cloud computing is being driven by many factors including:


  • Replacing aging infrastructure that has reliability issues and requires very high costs and efforts to keep it running.
  • Simplifying overall infrastructure complexity and adopting greater virtualization.
  • Supporting businesses’ needs to reduce time-to-market and increase innovation.
  • Deploying additional capacity in near real time to support business growth, launch new services/products, and/or streamline expansion into new geographies.


Nowhere is this more evident than in large enterprises that have adopted multiple best-of-breed application stacks like SAP, Oracle and Microsoft SQL Server.  These major platforms have typically  all been allocated isolated/dedicated infrastructure silos including separate landscapes for production and multiple non-production instances.  This approach, while considered "best practices" 5 years ago,  is chaos to manage and cost prohibitive to modernize without a significant change in strategy.  Many of these "Tier 1" applications are not good candidates for moving to external contract cloud providers due to application complexity and multiple integration points with other systems. The better alternative is to consolidate mixed applications to fewer and more standardized platforms using easy to procure and easy to manage converged infrastructure.  The VCE Solution for Enterprise Mixed Workload on Vblock System 740 solution guide shows enterprise architects and decision makers how to be successful consolidating Oracle, SAP and Microsoft SQL Server mixed application workload types on a single  Vblock 740. Reviewing this guide with your team can be an effective starting point for understanding and planning your move from silos to converged platforms.


Highlights from the Solutions Guide

Generating a large enough simulated workload to exhaust the compute, networking and/or storage throughput of a Vblock 740 with VMAX All-flash storage is a big task so we didn't even set that as a goal.  Our mission was to generate enough different types of application workloads to show conclusively that the platform is not negatively impacted by random vs sequential access patterns or I/O request block size.  The following figure shows the total combined test results of the mixed workload simulated load on the Vblock 740 as described in the solution guide (see link below for download).

740 mixed workloads results.PNG.pngLatency for both reads and writes across a wide range of block sizes is significantly below all the best practices guidleines prescribed by the software vendors of the enterprise application we used for the test.  So no hero numbers in this guide but there is lots of great guidance and results showing how to setup, configure and run a consolidation environment on a Vblock.  Please have a read through the guide and start sharing with others in your organization that want to join the movement to converged infrastructure adoption - build that team.


Download the solution guide in PDF format:



Thanks for reading,

Phil Hummel, EMCDSA

Look me up on Twitter @GotDisk



hadoop summit.png

The first thing that I noticed as I made my way to the registration desk at the Hadoop Summit was the abundance of Fortune 1000 companies that had sent people to this show.  Many of the companies sending employees to the Hadoop Summit are the big data pioneers that burned through all the best practices and platform options for an enterprise data warehouse years ago. They are now in various stages of deploying Hadoop and Big Data analytics.  Slim Baltagi (@slimbalgati)  from Capital One Financial Corp noted in the introduction to his session on Analysis of Trends in Big Data Analytics that he has been doing Big Data projects for 7 years and has the scars to prove it.  Many of the attendees at the San Jose Summit talked extensively about the numerous experiments and several production deployments they have been a part of.  They were genuinely exited  to be at the show to learn more and share experiences with the Hadoop community.  The world of Big Data is moving at light speed with a dizzying array of new open source and commercial tool options being announced every few weeks.  The Big Data community is very open to exchanging information and even code through open source projects while remaining laser focused on solving business problems in a unique and proprietary implementation for the benefit of a single organization.


That passion for knowledge is what made this such a great show for VCE (@VCE), the EMC Converged Platforms Division.  For example, at EMC's  session on Increasing Hadoop Resiliency & Performance with EMC Isilon delivered by Boni Bruno (@bonibruno), about 50% of the participants knew what our Isilon (@EMCIsilon) product is and the other 50% where getting their first introduction to the product in a context they are passionate about, Big Data.


The experience with talking to attendees in the EMC booth was very similar. Approximately 50% of attendees knew about EMC's efforts in Big Data and the other half were hearing about our offerings for the first time.  This was an excellent show for us to spread the word about our Big Data Solution as well as our Hadoop compatible platforms from the Elastic Cloud Storage (@EMCECS) and DSSD (@EMCDSSD) business units.  I'm going to finish this post with some background on the EMC Big Data Solution with links to additional online content for those wanting to dig deeper.  I'll be writing more about the EMC ECS and DSSD solutions in future posts and covering all new announcements on Twitter using the handle @GotDisk.


     Big Data Solution Architecture Block Diagram

BDS Architecture.PNG.png

The EMC Big Data Solution (BDS) is a complete turn key hardware and software platform for implementing a Hadoop-based data lake with all the required management and end-user tools for data engineers and data scientists. The BDS builds on on the very popular VCE Vblock converged infrastructure platform.  As you can see in the graphic above, the core Vblock is enhanced with additional software integration from EMC partners to provide a complete data lake and analytics platform in an integrated solution. 

For data engineers, the BDS offers data lake hardware options for using EMC Isilon, EMC Elastic Cloud Storage and/or the EMC XtremIO all-flash array platform.  Data ingestion, lineage, and quality are managed by Zaloni software for complete data lake management.  Zaloni help simplify and automate common data management tasks so organization can focus resources on building analytics.  Data index and search services are provided by Attivio software.  Many of the Fortune 100 rely on Attivio to quickly find, correlate, and return the most relevant search results to boost knowledge worker productivity. The third tool of interest for data engineers is Blue Talon for data access and authorization.  BlueTalon offers the best data control technology for today’s data challenges including authorization, access control, enforcement and auditing.  VCE integrates these critical pieces necessary for an enterprise grade data lake implementation into a single SKU with a single point of support for your entire investment.


For data scientists, the BDS provides a platform for productivity and collaboration.  The environment for data scientists is organized around the creation, use, and sharing of work spaces.  A work space is a collection of tools and data focused on a user defined analysis project.  Tools that can be easily deployed into a work space include R, RStudio, Python, Java and Tableau for visualization shown in the Extension Packs block in the diagram above.  The BDS can also support other technologies such as MongoDB to allow for the users to apply the right tools to best address their needs.

Work spaces can be shared between multiple collaborators that need to work jointly on a project using a common pool of data.  Data is organized into one or  more data sets that can be saved either in the work space or back to the data lake to be available to other data scientists.  Having your data analytics platform organized around data sharing and collaboration with tightly integrated access controls will immediately create an environment for better data governance and productivity.

The EMC BDS is currently being previewed at selected customer locations and trade shows around the globe.  There were a number of interviews conducted by SiliconAngle for theCUBE at the Hadoop Summit.  Here are links to a couple of recordings featuring EMC's Carey James (@careyjames33) from  with various partners mentioned above in this article.


Carey James, EMC with Joe Litchtman, Attivio and Tony Fisher, Zaloni via #theCUBE on YouTube


Carey James, EMC  with Stephen Shartzer, Blue Talon via  #theCUBE on YouTube


For information on upcoming shows that will be featuring the EMC Business Data Solution or any questions related to other Big Data products or solutions mentioned in this article, leave us a message using the comments section below.


Thanks for reading,

Phil Hummel, EMCDSA





Everyone is getting bombarded with messaging related to Big Data.  It's, well big, and machine data is one of the fastest growing and complex areas driving the interest in doing more with big data. It’s also potentially one of the most valuable since it can be important to doing analytics related to customer behavior, sensor readings, machine behavior, security threats, fraudulent activity and more.

The biggest challenge I hear from people evaluating where to get started with big data is they are stuck in "analysis paralysis".  Figuring out what software and hardware platforms to use can seem like an overwhelming obstacle given the choices available. Based on my conversations with attendees at the most recent EMC World, two of the features they are looking for in a big data solutions are:

  • Minimizing in-house coding and development
  • Finding a platform that can scale as they go.


VCE, the Converged Platform Division (CPD) of EMC, just released a paper titled Providing Enterprise Performance, Capacity, and Data Services for Splunk Enterprise.  This paper describes a VCE infrastructure solution that highlights flexible scaling options and tight integration with Splunk software for performing analysis that is targeted at machine data.  The solution addresses both of the customer requirements noted above.  I will provide some additional background on both Splunk and the VCE solution or you can jump straight to the link in this paragraph to download the full paper now.


Splunk Enterprise indexes any machine data from virtually any source, format, or location in real time. This includes data streaming from applications, app servers, web servers, databases, wire data from networks, virtual machines, telecoms equipment, operating systems, sensors and much more.  Splunk indexes contain information about the time of the event, keywords (terms), and any discovered relationships between events. Users can then search, analyze, and visualize machine data using Splunk indexes.  In order for Splunk users to more efficiently handle the constant stream of event data from multiple sources, Splunk uses the concept of buckets to store data in classifications of hot, warm, cold, and frozen tiers.

splunk buckets.png

Data is searched in order from hot to cold.  Frozen data is not typically queried and is marked for deletion.  Data is physically moved between buckets during the aging process and therefore can utilize different classes of storage to increase cost efficacy for the system.  Given the breadth of potential data sources that Splunk can process, these environments need to have a flexible supporting compute and storage architectural design. 

vce splunk.PNG.png

VCE has a portfolio of products that give customer options for implementing tiered storage infrastructure able to handle high-performance hot and warm data, as well as high capacity cold and frozen data all from a single vendor.   In this solution we show how using a combination of Vblock® and VxRack™ systems, organizations can simplify and optimize provisioning, deployment, and management of Splunk search and analytics workloads. 

The Vblock System 540’s scalable, linear architecture easily accommodates expansion by scaling-out to >1M IOPS at <1ms

latencies for all of your hot and warm data queries and workloads within your Splunk Enterprise environment.


VCE Technology Extensions for EMC Isilon® complements a Splunk Enterprise scale-out architecture by providing a powerful, cost-effective scale-out storage cluster for the retention of cold data in Splunk.   A VCE Technology Extensions for Isilon cluster creates a unified pool of highly efficient storage, with a proven 80 percent storage utilization rate.  VCE Technology Extensions for Isilon’s single-volume, single-file system and simplified management typically require less than one full-time employee per petabyte (PB), reducing your overall storage administration costs.

For enterprises interested in making the move to hyper converged infrastructure (HCI), VCE and EMC recommend the VCE VxRack System 1000 Flex for Splunk Enterprise deployments.  These self-contained units of servers and networking offer scalability, flexibility, and resilience that make it an ideal platform for Splunk.  The storage foundation of the VxRack System 1000 Flex is based on the EMC ScaleIO. ScaleIO converges storage and compute resources to form a single-layer, enterprise-grade HCI implementation.

The VxRack System 1000 Flex, with ScaleIO software, utilizes VCE’s integrated compute nodes’ DAS and aggregates all disks into a global, shared, block storage pool. ScaleIO enables a single-layer compute and storage architecture without requiring additional hardware. Its scale-out server SAN architecture can expand to accommodate thousands of servers.

Get the White Paper for More Details

Machine data is one of the fastest growing and most complex areas of big data collection and analytics.  Making use of machine data can be challenging unless you pay attention to the platform and tools you implement. With Splunk Enterprise combined with Vblock and VxRack Systems, and flexible options like VCE technology extensions, organizations can easily, efficiently, and cost effectively incorporate enterprise level data analytics and search for real-time operational intelligence. Get the full white paper using the PDF image below.




Thanks for reading,

Phil Hummel @GotDisk