During my 10 year tenure at Microsoft, I met with many corporate and commercial software development teams for architecture review sessions. The most likely recommendation to result from those sessions was "instrument your code". Ongoing development, testing and support are much easier if you have a detailed historical record of what has been happening with your product. Most communication and computer equipment as well as commercial software products that are purchased today produce these detailed records of important activity and events. This machine generated data is the fastest growing segment of what we call the "big data" market.
If you have never been involved in hardware/software development or IT support, this may all sound a little abstract. If you have access to a Microsoft Windows computer, go to the search bar and type event. The first suggestion should be a program called event viewer, double click that icon to start the program.
Welcome to the world of machine data! Two things I want to highlight are:
- There is an incredibly large number of activity and event types that are collected, and
- It is impossible, even for an expert, to tell if this machine is "healthy" or not from this display.
This is "raw" event data presented in lists. While the major operating system vendors like Microsoft make it easy for hardware and software developers to write events into a central logging framework using a simple application programming interface (API), the result of all this effort is a giant bucket of bits. Someone then has to write software to analyze and make sense of the raw event data to derive insights.
And all this raw data you're seeing is coming from just one Windows computer. Every piece of networking, server, storage and specialty hardware gear in and out of a corporate data center has an activity and event logging capability just as complex, or more so, as the Windows OS event system viewer shown here. And, there are no standards or even conventions for how to construct or store activity and event data employed across multiple products. Every vendor and every product will have a unique format for machine data.
Now you can start to get a feel for the formidable complexity that confronts the operations staff of a corporate data center. If someone asked me how I would architect a software analysis tool that could handle this level of complexity, first, I would suggest that they design a source independent representation of an activity and event that could represent the entire universe of data sources that I was going to encounter. Then second, I would start writing source specific pre-processors that would translate the raw data from each source into my internal and universal data representation.
However, if you haven't tackled this problem yet, or aren't happy with the solution you have don't break out a compiler and start writing code. You should really check out our partner, Splunk Software, ranked #1 in Worldwide IT Operations Analytics Software market share. They have already implement this approach and much more for handling the complexity of machine data with their Splunk Enterprise product.
Splunk Enterprise can index any kind of streaming, machine, and historical data, such as Windows event logs, web server logs, live application logs, network feeds, system metrics, change monitoring, message queues, archive files, and more. Splunk Enterprise transforms incoming data into events, which it stores in indexes. The index is the repository for Splunk Enterprise data that facilitates flexible searching and fast data retrieval. Splunk Enterprise handles everything with flat files using an application native format that doesn't require any third-party database software products. This architecture gives Splunk a great foundation for controlling scale and performance.
Another aspect of Splunk Enterprise architecture that fits with best practices for handling data complexity is the application (apps) and add-ons framework. Apps and add-ons are both packaged sets of configuration that you install on your Splunk Enterprise instance that make it easier to integrate with, or ingest data from, other technologies or vendors. Although you don't need apps or add-ons to index data with Splunk Enterprise, apps and add-ons can enhance and extend the Splunk platform with ready-to-use functions ranging from optimized data collection to monitoring security, IT management and more.
Dell EMC and Slunk work closely to provide a total solution with Splunk Enterprise and Dell EMC hyper-converged platforms tailored to address the complexity of machine data analytics. Our Ready Systems for Splunk provide non-disruptive scalability and performance, optimized for Splunk workloads Dell EMC Ready Systems for Splunk are purpose-built for the needs of Splunk, helping consolidate, simplify and protect machine data. These Ready Solutions include the hardware, software, resources and services needed to quickly deploy and manage Splunk in your business. Check out these resources for more details
There are a bunch more features of the Splunk Enterprise platform that I want to write about including the use of multiple index locations for data aging and scale and how the main services are implemented as individually install-able and configurable components but that is going to have to be another article - coming soon.
Thanks for reading,