Community Edition 0.7, 0.8, 0.9 Release Notes- Neural Magic

Dec 08, 2021

Author(s)

Jeannie Finks

Head of Customer Success, Neural Magic

The full technical release notes are always found within our GitHub release indexes linked from our Docs website or the specific Neural Magic repository.

SparseZoo

The latest additions to sparsezoo.neuralmagic.com!

Sparse BERT mask language modeling models with example recipes for transferring to other downstream datasets
Pruned-Quantized BERT models on SQuAD (Question Answering)
YOLACT - for image segmentation

DeepSparse Engine

Optimization Through Tensor Column Support

In the 0.8 release, we enabled initial support for proprietary Tensor Columns in the DeepSparse Engine. In the 0.9 release, we generalized and optimized Tensor Columns further to include high-compute operations followed by memory-bound operations, such as MatMul followed by Softmax. Tensor Columns allow for performance improvement over just reducing compute through model optimizations like compound sparsification. By breaking up activations from successive layers into sections of columns that fit into cache, operations that are memory-bound can be kept close to a CPU core until finally written to memory. Graphs below show the impact on performance that Tensor Columns deliver.

There are more performance improvements to come from Tensor Columns in future releases.

DeepSparse Model Deployment

Examples: YOLACT, BERT; Integration: BERT

APIs Available

Use a C++ API as the interface between your application and the Neural Magic DeepSparse Engine. A simple demo with code is also provided to invoke the DeepSparse Engine using the C++ API. Once you have installed the DeepSparse Engine, you will be ready to use the C++ API and take advantage of the library libdeepsparse.

SparseML

New Transfer Learning Integrations, Recipes, and Tutorials

Neural Magic’s ML team creates sparsified models that allow anyone to plug in their data and leverage pre-sparsified models from the SparseZoo. Sparsifying involves removing redundant information from neural networks using algorithms such as pruning and quantization, among others. This sparsification process results in many benefits for deployment environments, including faster inference and smaller file sizes.

NLP: Question Answering Use Case with BERT: This end-to-end guided experience will allow you to start from a Neural Magic pre-trained BERT model in the SparseZoo, apply a private dataset with a recipe using SparseML, and deploy on a CPU with the DeepSparse Engine.

Directly in our GitHub repo:

YOLACT Tutorials: training integration or recipe application
Hugging Face Transformers Training Integration and BERT: overview, installation, quick tour
BERT - Apply a Recipe: As an alternative to the end-to-end guided question answering use case experience above, this tutorial focuses specifically on applying recipes workflows that simply the sparsification process.
Mask Language Modeling Transfer Learning: BERT tutorial
PyTorch Image Classification: This tutorial using PyTorch shows how Neural Magic sparse models simplify the sparsification process by offering pre-sparsified models for transfer learning onto other datasets.

For user help or questions about anything of these highlights, sign up or log in: Deep Sparse Community Discourse Forum and/or Slack. We are growing the community member by member and are happy to see you there.

Was this article helpful?

YesNo

Author(s)

Jeannie Finks

Head of Customer Success, Neural Magic