How to Run ResNet at a Fraction of the Cost


With greater speeds and accuracy.

Does your data science team use ResNet? Neural Magic found a novel way to run ResNet models on commodity CPUs with GPU-class performance, at a fraction of the cost. By making ResNet models achieve best-in-class performance on everyday CPUs, teams can experience drastic cost savings. In this blog post, we’ll cover a brief history of ResNet, how it’s typically used, current limitations, and details on how Neural Magic makes running ResNet more performant and cheaper.

What is ResNet? 

Short for Residual Networks, the ResNet model was the winner of the ImageNet Challenge in 2015. ResNets allow data scientists to train a deep neural network with 150+ layers successfully by utilizing “skip connections,” or shortcuts to skip over layers. This effectively simplifies the network by using fewer layers in initial training phases. In addition, skip connections reduce the impact of vanishing gradients, by reusing activations from a previous layer until the adjacent layer learns its weights.

How is ResNet Used?

Today, ResNet is a popular model for image classification and object detection use cases. According to our recent deep learning survey, more than half of the data scientists surveyed were using ResNet.

ResNet is so widely used because it’s a simple yet powerful computer vision model; teams can train hundreds or potentially thousands of layer, and still achieve great performance results. In addition, ResNet is roughly at the inflection point between accuracy and floating point operations per second (FLOPS), which makes it an appealing choice for precision applications.

Limitations of ResNet Models

For most teams, the biggest challenges with ResNet lie in the model’s computational density, which requires significantly more FLOPS than similar models such as MobileNets or EfficientNets. Because they’re so computationally heavy, ResNets are typically run on GPUs. However, GPUs get increasingly costly when deploying ResNet models at scale. Often, teams either are running on expensive GPUs like NVIDIA Tesla v100s, or making performance sacrifices to deploy on NVIDIA p100s or T4s.

Fortunately, with Neural Magic, data science teams can get GPU-class performance without the cost – on everyday CPUs.

Using Neural Magic with ResNet

Using the Neural Magic Inference Engine, data scientists can run ResNet models on everyday Intel CPUs in production in three ways: 

  • Baseline (dense)
  • Sparse (better performance with equivalent baseline accuracy)
  • Sparse performance (a percentage point drop in accuracy, but even better performance).

For data scientists that want to run sparse models, Neural Magic has an optimized version of ResNet available for use via our model repo. This model can be run using the ImageNet dataset, or by transfer-learning via an API to make it specific to an organization’s own data. Using Neural Magic, data scientists can lower the cost of deep learning deployments on ResNet by using commodity CPUs, vs. expensive GPUs.

As a result, teams can increase their performance and decrease costs. Here are some of the benchmarks from a recent ResNet-50 test running Neural Magic Inference Engine software one a 4-core CPU and 18-core CPU vs. the 18-core CPU (without Neural Magic software) and NVIDIA T4 GPU.

Want to get better performance on ResNet at lower cost? Get a live demo to learn more