How to Get Faster MobileNetV2 Performance on CPUs


TL;DR: Learn more about use cases for lightweight MobileNetV2 models, and how Neural Magic’s Inference Engine exploits its architecture to run them even faster on commodity CPUs.

Photo by Markus Spiske on Unsplash

Read Time: 4 minutes, 32 seconds

Ever wonder what’s the machine learning model that powers “Portrait Mode” on your iPhone or the ability to swap out backgrounds on YouTube? It’s MobileNetV2. Many data scientists use it for image classification (and object detection when combined with SSD or YOLO for example) because of its low computational power.

This blog post will provide a brief overview of MobileNetV2 models, how they’re used, and why to deploy them with Neural Magic. Furthermore, we’ll discuss what’s unique about MobileNetV2 that allows the Neural Magic Inference Engine to run it much faster than other providers – up to 14x in some instances. If you’re interested in similar articles on improving machine learning performance, check out our ResNet post.

What is MobileNetV2

MobileNet is an open source model for efficient, on-device computer vision introduced by Google in 2017. These mobile-first computer vision models were built for TensorFlow and designed to maximize accuracy, while keeping in mind the restricted resources for on-device or embedded applications. 

A year later, in 2018, MobileNetV2 was introduced. It is more performant and more accurate than its V1 predecessor. According to Google, MobileNetV2 uses 2x fewer operations, needs 30% fewer parameters and is about 30-40% faster than MobileNetV1 models. These advancements opened up a whole new array of edge use cases and applications.

A key aspect of both MobileNets V1 and V2 is their use of depthwise separable convolutions, which significantly reduce the number of parameters compared to networks of the same depth but with regular convolutions. As a result, MobileNet models are lightweight neural networks that are memory-bound (rather than compute-bound), which is why Neural Magic Inference Engine runs them so well.

How is MobileNetV2 Used?

MobileNetV2 is a small, low-latency, low-power model that can be broadly applicable for many use cases. It’s fast, easy to use, and ideal for environments with constrained compute. As mentioned before, it can be used for image classification and object detection, but runs exceptionally well on CPUs instead of costly and resource-intensive GPUs.

MobileNetV2 (as well as other MobileNets) excel at edge deployments because of their smaller size compared to other computer vision models.

Why Use Neural Magic with MobileNetV2

Neural Magic Inference Engine has optimizations that allow it to run MobileNets faster than anyone, by exploiting the CPU and network architecture in a much better way. Our proprietary engine algorithms deliver better MobileNet performance than all other standalone CPUs.

Neural Magic’s Inference Engine is perfectly suited to run MobileNetV2 models at high speeds, without impacting accuracy. On computationally dense models like ResNet-50, we use optimization techniques that allow us to sparsify the model for better performance. In the case of MobiletNetV2, the model is already naturally sparse, so it does not require pruning for performance gains with our engine. That being said, MobileNetV2 can be pruned further if accuracy recovery trade-off is acceptable. 

The Neural Magic Inference Engine works by optimizing how a neural network is executed across the available memory hierarchies in a CPU. The associated engine algorithms identify memory-bound processes within the network – like depthwise convolutions, as an example – and apply optimization techniques to accelerate performance of those components.

Benchmarking MobileNetV2 with the Neural Magic Inference Engine

The below benchmarks are done with the ImageNet dataset. We’ve made it easy to run benchmarks on your own data by using a simple transfer-learning API. Contact us to learn more.

Here are the numbers from a recent MobileNetV2 test, using ImageNet dataset, and running in the Neural Magic Inference Engine software on an Intel x86 CPU. It’s important to note that we maintained baseline accuracy.

The graph shows the maximum IPS (images-per-second) Neural Magic Inference Engine was able to achieve with MobileNetV2 for batch size 1, fp 32, on a 4-core CPU.

On a 4-core CPU, Neural Magic Inference Engine is capable of achieving 12.7x better performance than a standalone 4-core CPU, 4.5x better than DNNL, and 1.2x better than OpenVINO.

Let’s increase the batch size to 64, while leaving other parameters the same:

At batch size 64, Neural Magic Inference Engine is capable of achieving 14x better performance than a standalone CPU, 5.2x better than DNNL, and 1.8x better than OpenVINO.

Cost Savings

We measure cost savings by looking at cost per inference. To do so, we take into account gained performance, CPU costs, and the Neural Magic Inference Engine fees. For batch size 1, the Neural Magic Inference Engine offers 92% savings over a standalone CPU, 78% over DNNL, and 18% over OpenVINO. Cost savings are even more significant for throughput (batch size 64): 93% savings over a standalone CPU, 81% over DNNL, and 42% over OpenVINO.

If you like what you see, and would like to benchmark MobileNetV2 or any other model in your own environment, check out our GitHub repos. Or contact us to discuss our approach.