Neural Magic Launches High-Performance Inference Engine and Tool Suite

Jun 18, 2020

Author(s)

Sasa Zelenovic

Head of Developer Marketing, Neural Magic

Run computer vision models at lower cost with a suite of new tools that simplify model performance.

Today, Neural Magic is announcing the release of its Inference Engine software, the NM Model Repo, and our ML Tooling. Now, data science teams can run computer vision models in production on commodity CPUs – at a fraction of the cost of specialized hardware accelerators – without making sacrifices to performance and accuracy.

Until now, running computer vision models in production required costly, specialized AI hardware, such as GPUs. Organizations typically have many machine learning projects underway, with expensive resources running wild in the cloud while they are purchasing new, specialized hardware for their data centers. As a result, overall computer vision application costs are high and the bar to achieve positive results is unclear and hard to reach. With Neural Magic’s new suite of inference tools and software, data science teams in industries including high tech, manufacturing, telecom, ecommerce, healthcare, and media can lower costs by taking advantage of commodity CPU resources and eliminating the need for specialized AI-specific hardware.

What’s under the hood?

The Neural Magic product consists of three main components:

Model Repo: Teams can choose from off-the-shelf, performance-optimized models to run in the Neural Magic Inference Engine. The Model Repo features models sparsified with the latest pruning techniques to deliver exceptional performance on CPUs, and accelerates the process of deploying those models in production. Currently, teams can choose from a growing library of popular image classification models (such as ResNet, MobileNet, VGG, and EfficientNet) and object detection models (such as ResNet-SSD and MobileNet-SSD). More models are being added regularly. Unlike other repositories, Neural Magic already did the hard work of building, pruning, and re-training the models for immediate use in production.
ML Tooling: These recalibration tools simplify the process of making models run fast on the Neural Magic Inference Engine with a collection of Jupyter Notebooks, libraries, scripts, and APIs that work with both TensorFlow and PyTorch (with Keras support coming soon). Only Neural Magic helps teams visualize the performance and accuracy tradeoffs often associated with model optimizations. Available recalibration APIs include:

Pruning API: Data scientists can bring their own custom models and data to be recalibrated for performance. This approach works for computer vision models and datasets.
Transfer Learning API: Existing models from the Neural Magic Repo can be transfer-learned with customers’ data, and then recalibrated. This approach makes it easier for teams to use Neural Magic pre-optimized models with their own data, in their own environments.

Neural Magic Inference Engine (NMIE): Once the models are optimized, they’re executed in production by the NMIE. As a software binary invoked via command line and/or APIs, NMIE simply snaps into existing model serving solutions and deployment pipelines. NMIE is managed like any modern software application via Kubernetes and runs in the user’s space: in the cloud, on premise, or at the edge (x86 processors required).

Raising the Bar for CPU Performance

The Neural Magic Inference Engine meets the following benchmarks using Intel x86 processors versus NVIDIA T4 GPUs:

MobileNet-V2: 7x better performance (batch size 1 fp32)
EfficientNet-B0: 6x better performance (batch size 1 fp32)
ResNet-50: 2x better performance (batch size 1 fp32)

Pricing & Packaging

Neural Magic offers a flexible subscription for production deployment based on compute capacity, as well as a developer subscription aimed at non-production usage. Partner options are also available.