Find Communities by: Category | Product

Deep learning has exploded over the landscape of both the popular and business media landscapes.  Current and upcoming technology capable of powering the calculations required by deep learning algorithms has enabled a rapid transition from new theories to new applications.  One of current supporting technologies that is expanding at an increasing rate is in the area of faster and more use case specific hardware accelerators for deep learning such as GPUs with tensor cores and FPGAs hosted inside of servers. Another foundational deep learning technology that has advanced very rapidly is the software that enables implementations of complex deep learning networks. New frameworks, tools and applications are entering the landscape quickly to accomplish this, some compatible with existing infrastructure and others that require workflow overhauls.


As organizations begin to develop more complex strategies for incorporating deep learning they are likely to start to leverage multiple frameworks and application stacks for specific use cases and to compare performance and accuracy. But training models is time consuming and ties up expensive compute resources. In addition, adjustments and tuning can vary between frameworks, creating a large number of framework knobs and levers to remember how to operate. What if there was a framework that could just consume these models right out the box?


 

BigDL is a distributed deep learning framework with native Spark integration, allowing it to leverage Spark during model training, prediction, and tuning. One of the things that I really like about Intel BigDL is how easy it is to work with models built and/or trained in Tensorflow, Caffe and Torch. This rich interop support for deep learning models allows BigDL applications to leverage the plethora of models that currently exist with little or no additional effort. Here are just a few ways this might be used in your applications:


  • Efficient Scale Out - Using BigDL you can scale out a model that was trained on a single node or workstation and leverage it at scale with Apache Spark. This can be useful for training on a large distributed dataset that already exists in your HDFS environment or for performing inferencing such as prediction and classification on a very large and often changing dataset.


  • Transfer Learning - Load a pretrained model with weights and then freeze some layers, append new layers and train / retrain layers. Transfer learning can improve accuracy or reduce training time by allowing you to start with a model that is used to do one thing, such as classify a different objects, and use it to accelerate development of a model to classify something else, such as specific car models.

  • High Performance on CPU - GPUs get all of the hype when it comes to deep learning. By leveraging Intel MKL and multi threading Spark tasks you can achieve better CPU driven performance leveraging BigDL than you would see with Tensorflow, Caffe or Torch when using Xeon processors.

  • Dataset Access - Designed to run in Hadoop, BigDL can compute where your data already exists. This can save time and effort since data does not need to be transferred to a seperate GPU environment to be used with the deep learning model. This means that your entire pipeline from ingest to model training and inference can all happen in one environment, Hadoop.


Real Data + Real Problem


Recently I had a chance to take advantage of the model portability feature of BigDL. After learning of an internal project here at Dell EMC, leveraging deep learning and telemetry data to predict component failures, my team decided we wanted to take our Ready Solution for AI - Machine Learning with Hadoop and see how it did with the problem.


The team conducting the project for our support organization was using Tensorflow with GPU accelerators to train an LSTM model. The dataset was sensor readings from internal components collected at 15 minute intervals showing all kinds of metrics like temperature, fan speeds, runtimes, faults etc.


Initially my team wanted to focus on testing out two use cases for BigDL:


  • Using BigDL model portability to perform inference using the existing tensorflow model
  • Implement an LSTM model in BigDL and train it with this dataset


As always, there were some preprocessing and data cleaning steps that had happened before we could get to modeling and inference. Luckily for us though we received the clean output of those steps from our support team to get started quickly. We received the data in the form of multiple csv files, already balanced with records of devices that did fail and those that did not. We got over 200,000 rows of data that looked something like this:


device_id,timestamp,model,firmware,sensor1,sensor2,sensor3,sensorN,label

String,string,string,string,float,float,float,float,int


Converting the data to a tfrecord format used by Tensorflow was being done with Python and pandas dataframes. Moving this process to be done in Spark is another area we knew we wanted to dig in to, but to start we wanted to focus on our above mentioned goals. When we started the pipeline looked like this:


From Tensorflow to BigDL

 

For BigDL, instead of creating tfrecords we needed to end up with an RDD of Sample(s). Each Sample is one record of your dataset in the form of feature, label. Feature and label are in the form of one or more tensors and we create the sample from ndarray. Looking at the current pipeline we were able to simple take the objects created before writing to tfrecord and instead wrote a function that took these arrays and formed our RDD of Sample for BigDL.


def convert_to(x, y):
  sequences = x
  labels = y

  record = zip(x,y)
  record_rdd = sc.parallelize(record)

  sample_rdd = record_rdd.map(lambda x:Sample.from_ndarray(x[0], x[1]))
  return sample_rdd

train = convert_to(x_train,y_train)
val = convert_to(x_val,y_val)
test = convert_to(x_test,y_test)




After that we took the pb and bin files representing the pretrained models definition and weights and loaded it using the BigDL Model.load_tensorflow function. It requires knowing the input and output names for the model, but the tensorflow graph summary tool can help out with that. It also requires a pb and bin file specifically, but if what you have is a ckpt file from tensorflow that can be converted with tools provided by BigDL.


model_def = "tf_modell/model.pb"
model_variable = "tf_model/model.bin"
inputs = ["Placeholder"]
outputs = ["prediction/Softmax"]
trained_tf_model = Model.load_tensorflow(model_def, inputs, outputs, byte_order = "little_endian", bigdl_type="float", bin_file=model_variable)




Now with our data already in the correct format we can go ahead and inference against our test dataset. BigDL provides Model.evaluate and we can pass it our RDD as well as the validation method to use, in this case Top1Accuracy.


results = trained_tf_model.evaluate(test,128,[Top1Accuracy()])



 

Defining a Model with BigDL

 

After testing out loading the pretrained tensorflow model the next experiment we wanted to conduct was to train an LSTM model defined with BigDL. BigDL provides a Sequential API and a Functional API for defining models. The Sequential API is for simpler models, with the Functional API being better for complex models. The Functional API describes the model as a graph. Since our model is LSTM we will use the Sequential API.


Defining an LSTM model is as simple as:


def build_model(input_size, hidden_size, output_size):
    model = Sequential()
    recurrent = Recurrent()
    recurrent.add(LSTM(input_size, hidden_size))
    model.add(InferReshape([-1, input_size], True))
    model.add(recurrent)
    model.add(Select(2, -1))
    model.add(Linear(hidden_size, output_size))
    return model

lstm_model = build_model(n_input, n_hidden, n_classes)



 

After creating our model the next step is the optimizer and validation logic that our model will use to train and learn.


Create the optimizer:


optimizer = Optimizer(
    model=lstm_model,
    training_rdd=train,
    criterion=CrossEntropyCriterion(),
    optim_method=Adam(),
    end_trigger=MaxEpoch(50),
    batch_size=batch_size)



 

Set the validation logic:

 

optimizer.set_validation(
    batch_size=batch_size,
    val_rdd=val,
    trigger=EveryEpoch(),
    val_method=[Top1Accuracy()])



 

Now we can do trained_model = optimizer.optimize() to train our model, in this case for 50 epochs. We also set our TrainSummary folder so that the data was logged. This allowed us to also get visualizations in Tensorboard, something that BigDL supports.


 

At this point we had completed the two initial tasks we had set out to do, load a pretrained Tensorflow model using BigDL and train a new model with BigDL. Hopefully you found some of this process interesting, and also got an idea for how easy BigDL is for this use case. The ability to leverage deep learning models inside Hadoop with no specialized hardware like Infiniband, GPU accelerators etc provides a great tool that is sure to change up the way you currently view your existing analytics.



By: Lucas A. Wilson, Ph.D. and Michael Bennett

 

Artificial intelligence (AI) is transforming the way businesses compete in today’s marketplace. Whether it’s improving business intelligence, streamlining supply chain or operational efficiencies, or creating new products, services, or capabilities for customers, AI should be a strategic component of any company’s digital transformation.

 

Deep neural networks have demonstrated astonishing abilities to identify objects, detect fraudulent behaviors, predict trends, recommend products, enable enhanced customer support through chatbots, convert voice to text and translate one language to another, and produce a whole host of other benefits for companies and researchers. They can categorize and summarize images, text, and audio recordings with human-level capability, but to do so they first need to be trained.

 

Deep learning, the process of training a neural network, can sometimes take days, weeks, or months, and effort and expertise is required to produce a neural network of sufficient quality to trust your business or research decisions on its recommendations. Most successful production systems go through many iterations of training, tuning and testing during development. Distributed deep learning can speed up this process, reducing the total time to tune and test so that your data science team can develop the right model faster, but requires a method to allow aggregation of knowledge between systems.

 

There are several evolving methods for efficiently implementing distributed deep learning, and the way in which you distribute the training of neural networks depends on your technology environment. Whether your compute environment is container native, high performance computing (HPC), or Hadoop/Spark clusters for Big Data analytics, your time to insight can be accelerated by using distributed deep learning. In this article we are going to explain and compare systems that use a centralized or replicated parameter server approach, a peer-to-peer approach, and finally a hybrid of these two developed specifically for Hadoop distributed big data environments.

 

Distributed Deep Learning in Container Native Environments

Container native (e.g., Kubernetes, Docker Swarm, OpenShift, etc.) have become the standard for many DevOps environments, where rapid, in-production software updates are the norm and bursts of computation may be shifted to public clouds. Most deep learning frameworks support distributed deep learning for these types of environments using a parameter server-based model that allows multiple processes to look at training data simultaneously, while aggregating knowledge into a single, central model.

 

The process of performing parameter server-based training starts with specifying the number of workers (processes that will look at training data) and parameter servers (processes that will handle the aggregation of error reduction information, backpropagate those adjustments, and update the workers). Additional parameters servers can act as replicas for improved load balancing.

 

parameter-server.png

Parameter server model for distributed deep learning

 

Worker processes are given a mini-batch of training data to test and evaluate, and upon completion of that mini-batch, report the differences (gradients) between produced and expected output back to the parameter server(s). The parameter server(s) will then handle the training of the network and transmitting copies of the updated model back to the workers to use in the next round.

 

This model is ideal for container native environments, where parameter server processes and worker processes can be naturally separated. Orchestration systems, such as Kubernetes, allow neural network models to be trained in container native environments using multiple hardware resources to improve training time. Additionally, many deep learning frameworks support parameter server-based distributed training, such as TensorFlow, PyTorch, Caffe2, and Cognitive Toolkit.

 

Distributed Deep Learning in HPC Environments

High performance computing (HPC) environments are generally built to support the execution of multi-node applications that are developed and executed using the single process, multiple data (SPMD) methodology, where data exchange is performed over high-bandwidth, low-latency networks, such as Mellanox InfiniBand and Intel OPA. These multi-node codes take advantage of these networks through the Message Passing Interface (MPI), which abstracts communications into send/receive and collective constructs.

 

Deep learning can be distributed with MPI using a communication pattern called Ring-AllReduce. In Ring-AllReduce each process is identical, unlike in the parameter-server model where processes are either workers or servers. The Horovod package by Uber (available for TensorFlow, Keras, and PyTorch) and the mpi_collectives contributions from Baidu (available in TensorFlow) use MPI Ring-AllReduce to exchange loss and gradient information between replicas of the neural network being trained. This peer-based approach means that all nodes in the solution are working to train the network, rather than some nodes acting solely as aggregators/distributors (as in the parameter server model). This can potentially lead to faster model convergence.

 

ring-allreduce.png

Ring-AllReduce model for distributed deep learning

 

The Dell EMC Ready Solutions for AI, Deep Learning with NVIDIA allows users to take advantage of high-bandwidth Mellanox InfiniBand EDR networking, fast Dell EMC Isilon storage, accelerated compute with NVIDIA V100 GPUs, and optimized TensorFlow, Keras, or Pytorch with Horovod (or TensorFlow with tensorflow.contrib.mpi_collectives) frameworks to help produce insights faster.

 

Distributed Deep Learning in Hadoop/Spark Environments

Hadoop and other Big Data platforms achieve extremely high performance for distributed processing but are not designed to support long running, stateful applications. Several approaches exist for executing distributed training under Apache Spark. Yahoo developed TensorFlowOnSpark, accomplishing the goal with an architecture that leveraged Spark for scheduling Tensorflow operations and RDMA for direct tensor communication between servers.

 

BigDL is a distributed deep learning library for Apache Spark. Unlike Yahoo’s TensorflowOnSpark, BigDL not only enables distributed training - it is designed from the ground up to work on Big Data systems. To enable efficient distributed training BigDL takes a data-parallel approach to training with synchronous mini-batch SGD (Stochastic Gradient Descent). Training data is partitioned into RDD samples and distributed to each worker. Model training is done in an iterative process that first computes gradients locally on each worker by taking advantage of locally stored partitions of the training data and model to perform in memory transformations. Then an AllReduce function schedules workers with tasks to calculate and update weights. Finally, a broadcast syncs the distributed copies of model with updated weights.

 

bigdl.png

BigDL implementation of AllReduce functionality

 

The Dell EMC Ready Solutions for AI, Machine Learning with Hadoop is configured to allow users to take advantage of the power of distributed deep learning with Intel BigDL and Apache Spark. It supports loading models and weights from other frameworks such as Tensorflow, Caffe and Torch to then be leveraged for training or inferencing. BigDL is a great way for users to quickly begin training neural networks using Apache Spark, widely recognized for how simple it makes data processing.

 

One more note on Hadoop and Spark environments: The Intel team working on BigDL has built and compiled high-level pipeline APIs, built-in deep learning models, and reference use cases into the Intel Analytics Zoo library. Analytics Zoo is based on BigDL but helps make it even easier to use through these high-level pipeline APIs designed to work with Spark Dataframes and built in models for things like object detection and image classification.

 

Conclusion

Regardless of whether you preferred server infrastructure is container native, HPC clusters, or Hadoop/Spark-enabled data lakes, distributed deep learning can help your data science team develop neural network models faster. Our Dell EMC Ready Solutions for Artificial Intelligence can work in any of these environments to help jumpstart your business’s AI journey. For more information on the Dell EMC Ready Solutions for Artificial Intelligence, go to dellemc.com/readyforai.



Lucas A. Wilson, Ph.D. is the Lead Data Scientist in Dell EMC's HPC & AI Engineering group. (Twitter: @lucasawilson)

Michael Bennett is a Senior Principal Engineer at Dell EMC working on Ready Solutions.

Picture1.jpgData science requires a mix of computer programming, mathematics/statistics and domain knowledge.  This article focuses on the intersection of the first two requirements.  Comprehensive software packages for classical machine learning, primarily supported by statistical algorithms, have been widely available for decades.  There are many mature offerings available from both the open source software community and commercial software makers.  Modern deep learning is less mature, still experiencing rapid innovation, and so the software landscape is more dynamic.  Data scientists engaged in deep learning must get more involved in programming than typically required for classical machine learning.  The remainder of this article explains the why and how of that effort.


First, let me start with a high-level summary of what deep learning is.  Computer scientists have been studying ways to perform speech recognition, image recognition, natural language processing, including translation, relationship identification, recommendation systems and other forms of data relationship discovery since computers were first invented.  After decades of parallel research in mathematics and software development, researchers discovered a methodology called artificial neural networks (ANNs) that could be used to solve these types of problems and many more using a common set of tools.  The building blocks of ANNs are layers.  Each layer typically accepts structured data (tensors) as inputs, then perform a type of transformation on that data, and finally, sends the transformed data to the next layer until the output layer is processed.  The layers typically used in ANNs can be grouped into categories, for example

  • Input Layers
  • Learnable Layers
  • Activation Layers
  • Pooling Layers
  • Combination Layers
  • Output Layers


The number and definition of the layers, how they are connected, and the data structures used between layers are called the model structure.  In addition to defining the structure, a data scientist must specify how the model is to be executed including the function to be optimized and optimization method.  Given the complexity of the mathematics and the need to efficiently process large data sets, the effort to create a deep learning software program is a significant development effort, even for professional computer scientists.


Deep learning frameworks were developed to make software for deep learning available to the wider community of programmers and data scientists.  Most of today’s popular frameworks are developed through open source software initiatives in each of which attract dozens of active developers.  The rate of innovation in the deep learning framework space is both impressive and somewhat overwhelming.


To further complicate the world of deep learning (yes, that is possible) despite the many similar capabilities of the most popular deep learning frameworks, there are also significant differences that lead to a need for careful evaluation for compatibility once a project is defined.  Based on a sample of the many comparisons of deep learning frameworks that can be found in just the last couple of years, I estimate that there are between 15-20 viable alternatives today.


The Intel® AI Academy recently published a comparison summary focused on frameworks that have versions optimized by Intel and that can effectively run on CPUs optimized for matrix multiplication.  The table below is a sample of the type of analysis and data that was collected.


Framework

# of GitHub Stars

# of GitHub Forks

TensorFlow*

                        60,030

                     28,808

Caffe*

                        18,354

                     11,277

Keras*

                        16,344

                       5,788

Microsoft Cognitive Toolkit*

                        11,250

                       2,823

MXNet*

                          9,951

                       3,730

Torch*

                          6,963

                       2,062

Deeplearning4J*

                          6,800

                       3,168

Theano*

                          6,417

                       2,154

neon™ framework

                          3,043

                          665

Source Intel AI Academy 2017

 

The NVIDIA Deep Learning AI website has a summary of deep learning frameworks such as Caffe2, Cognitive toolkit, MXNet, PyTorch, TensorFlow and others that support GPU-accelerated libraries such as cuDNN and NCCL to deliver high-performance multi-GPU accelerated training.  The page also includes links to learning and getting started resources.

 

Summary

  1. Don’t be surprised if the data science team proposes projects using different frameworks.
  2. Get curious if every project requires a different framework.
  3. Plan to age out some frameworks over time and bring in new ones.
  4. Allocate time for new framework investigation.
  5. Look for platforms that support multiple frameworks to reduce silos.
  6. Check online reviews from reputable sources to see how others rate a framework before adopting it for a project.

 

Thanks for reading,

Phil Hummel

@GotDisk