The best way that I have found to describe "Big Data" is data sets that are so large or complex that traditional data processing applications and experience are inadequate. What I find particulary relevant is that it recognizes 1) we are dealing with a moving target and 2) what is big for one organization can be routine for another based on experience. Take Facebook for example. In 2012, their VP of Engineering, Jay Parikh, said in a TechCrunch interview “we think we operate the single largest Hadoop system in the world.” at around 100PB. That was big in 2012 but in just 4 years that number doesn't seem so out of the ordinary.
For comparison here are some facts that show what ScaleIO customers are currently doing in their data centers
- 20PB today going to 100PB in 2017
4PB growing 100% year over year
100PB of Oracle alone
7PB growing 50% each year
- 10PB in our ESX environment
Of course Facebook has moved on to even bigger volumes with estimated injestion rates of 500+ TB per day, but what was once the territory only known by the largest of large web scale companies is now well within both the needs and capability of a growing number of more traditional manufacturing, banking, health care and other businesses that we encounter every day.
In order for enterprise IT teams to achieve their goals they have quickly adopted the best practices of web-scale companies like Facebook, Amazon, Yahoo, and others to operate data centers with ruthless efficiency. For example, the benefits of server virtualization are well understood in the modern enterprise data center. By abstracting, pooling, and automating compute resources, companies have achieved massive savings.
When it comes to managing storage however, the emulation of web scale practices in the enterprise is much less prevalent. We know that most of the large internet scale companies do not buy commercial storage array products for their core storage infrastructure - they build their own services using commodity hardware and applying the principles of abstraction, pooling, and automation. ScaleIO Software-Defined Storage applies these same principles to local storage in standard x86 servers to create a high performance SAN entirely with software achieving the same operational and TCO benefits that drive web scale efficiency.
Where to start
Our mission is to help bring this level of storage scale and operational efficiency to every organization that is moving toward a full implementation of web scale best practices to their data center operations. Our interactions with customers have shown that many customers are looking for a purpose-built solution to support their mission-critical application workloads delivering scale, performance, elasticity and resiliency. They want a solution that offers flexibility of choice when it comes to deployment models, configurations and broad support for operating systems and hypervisors. This solution also should be pre-configured, qualified and supported by a single vendor.
The All-Flash Dell EMC ScaleIO Ready Node is that solution. It combines Dell EMC ScaleIO software-defined storage and Dell EMC All-Flash,enabled PowerEdge servers to bring web scale to any data center. Customers can rapidly deploy a fully architected block storage server SAN that is a pre-validated, configured and fully supported solution.
If you need more flexibility in leveraging new or existing x86 server resources, you can also leverage ScaleIO software to transform them into an intelligent Server SAN. ScaleIO combines HDDs, SSDs and PCIe flash cards to create a virtual pool of storage with multiple performance tiers that is hardware agnostic. It can be installed on both physical and virtual application servers. As storage and compute resources change, ScaleIO software will automatically re-balance the storage distribution to optimize performance and capacity usage.
Historically, having server-based flash led to poor resource utilization because performance and capacity were only supporting local applications. Today, with the power of ScaleIO software defined storage, the ability to abstract, pool, and automate storage devices across a multitude of servers, and in turn allocate as little or as much performance and capacity as needed to individual applications, is just as easy as allocating compute and memory resources in a virtualized environment. Please use these resources to get more details.
Thanks for reading,
Phil Hummel, EMCDSA
On Twitter @GotDisk
Please share with your social networks