If you work in the world of deep learning, odds are you know all about EfficientNets, a family of models developed by Google researchers which achieve better accuracy with much smaller models than previous convolutional neural networks (CNNs). They are, more specifically, an image classification architecture that have set new records for accuracy on the ImageNet dataset.
EfficientNets are hailed for their high accuracy obtained on top of very few parameters and FLOPS, meaning they can theoretically run at breakthrough speed—the compelling insight from the paper. The potential applications of EfficientNets are broad and include anything based on image classification, as well as other domains that are using the inspiration for their own architectures, like EfficientDet, an object detection architecture.
EfficientNets are important because they have challenged conventional wisdom in the deep learning community. Before them, optimum accuracy was based on having as many parameters and FLOPS as possible. This is most prevalent with one of the previous record holders: the GPipe algorithm applied to AmoebaNet. EfficientNets are an order of magnitude smaller with similar accuracy, and they have dramatically shifted the paradigm between compute and accuracy.
Scaling Differently: What EfficientNets Show Us
The big breakthrough of EfficientNets is the understanding that you need to scale differently. It’s roughly the same architecture as MobileNetV2 with Squeeze Excite and Swish added in. But, instead of scaling only width as in MobileNetV2, it scales width (number of channels), depth (number of layers), and input size (number of image pixels) together.
Intuitively, this makes sense. If we keep everything constant but width, we will very quickly squeeze all the information we can for each layer without adding anything new to extract from. Hence, we very quickly hit the power law curve they highlighted in the paper. Scaling the input size and depth alongside width, therefore, gives the network not only more capacity, but more information to learn from with that additional capacity.
For these reasons, the EfficientNets paper made a big splash when it was released last year. However, there has been little real-world impact to date, for a variety of reasons. Today, we want to take a deeper look at the challenges of EfficientNets and what it may take to solve them in the coming years.
The Reality of EfficientNets Today
Based on our conversations with data science teams, many companies are experimenting with EfficientNets and investing in this area. However, very few are using EfficientNets in production. We’ve spoken to a few customers who have tried EfficientNets out internally, but none have been able to successfully deploy them at the time of the conversation.
So why is this promising avenue for machine learning not making its way into production? One issue is the training time and finicky nature of EfficientNets. Specifically, the hyperparameters were engineered to the point that even the choice of machine learning frameworks for training the architecture will noticeably affect top-line accuracy.
Additionally, today’s market lacks systems that can run EfficientNets in a performant way. Current hardware accelerators are engineered based on conventional wisdom and thus focus on larger traditional networks with more raw compute.
EfficientNets and The Lack of Compute Problem
The Google paper demonstrated that a limited FLOPS budget can still achieve State-of-the-Art results. Everything comes with a trade-off, though, and for them it came at the cost of data movement between the layers.
The main block for EfficientNets is a modified inverted bottleneck that builds on top of the depthwise convolution. This choice significantly reduces the FLOPS and parameters required by removing all channel connections in the depthwise layer, a form of structured sparsity. The complexity of the solution that the layer can represent is also reduced. So, the number of channels is increased to boost the overall capacity. The result is fewer params and FLOPS than other solutions, but more data movement due to the larger number of channels.
Hardware accelerators like GPUs are designed for models with large amounts of compute and where data movement is a relatively small component of overall performance. In the case of EfficientNets, we have a neural network architecture that has significantly less compute and more data movement than comparable networks. As a result, EfficientNets perform poorly on hardware accelerators.
Because EfficientNets are already sparse, and therefore memory bound when it comes to performance, the best way to run them “efficiently” is to apply systems thinking and accelerate the memory bottlenecks associated with data movement.
How Can this Be Fixed? The Right Tool for the Job
The solution to running EfficientNets successfully, like so many things in life, is to find the right tool for the job. Using our Neural Magic Inference Engine, running on commodity Intel CPUs, data science teams can deploy EfficientNets for their image classification application use cases at a fraction of the cost, with SOTA accuracy, compared to hardware accelerators.
At the heart of our Inference Engine are proprietary algorithms that accelerate memory-bound processes within neural network architectures. In our benchmarking tests, we see that a four-core c5 CPU instance on AWS running the Neural Magic Inference Engine delivers more than 2X speedup as compared to a T4 GPU instance, and better than 3X speedup versus a V100 GPU instance, both at batch size 1. When we compare relative price performance, measured in millions of items per dollar, the Neural Magic Inference Engine is between 10-30X more price performant than its GPU comparisons, again at batch size 1. At batch size 64, an 18-core c5 CPU instance on AWS running the Neural Magic Inference Engine processes 2x as many items per second as a T4 GPU instance. From a price performance perspective, four-core c5 instances deliver between 30 and 80% better price performance than their GPU counterparts.
EfficientNets are an important stepping stone on the pathway to sparse neural network algorithms, reducing compute without sacrificing accuracy with wider and deeper networks. We expect many more deep learning innovations to follow this trend over the next several years.
The challenge will be in connecting those who want to take advantage of these research innovations with the right tools to make them a reality. The freedom of software innovation is that it shortens the time to value for new research breakthroughs. We’ve written before about why software will eat the machine learning world, and EfficientNets are just one more example of how true this continues to be.
To see how Neural Magic can make EfficientNets run faster at scale, easing the path to production, sign up for early access.