Dec 08, 2021
Author(s)
The full technical release notes are always found within our GitHub release indexes linked from our Docs website or the specific Neural Magic repository.
SparseZoo
The latest additions to sparsezoo.neuralmagic.com!
- Sparse BERT mask language modeling models with example recipes for transferring to other downstream datasets
- Pruned-Quantized BERT models on SQuAD (Question Answering)
- YOLACT - for image segmentation
DeepSparse Engine
Optimization Through Tensor Column Support
In the 0.8 release, we enabled initial support for proprietary Tensor Columns in the DeepSparse Engine. In the 0.9 release, we generalized and optimized Tensor Columns further to include high-compute operations followed by memory-bound operations, such as MatMul followed by Softmax. Tensor Columns allow for performance improvement over just reducing compute through model optimizations like compound sparsification. By breaking up activations from successive layers into sections of columns that fit into cache, operations that are memory-bound can be kept close to a CPU core until finally written to memory. Graphs below show the impact on performance that Tensor Columns deliver.
There are more performance improvements to come from Tensor Columns in future releases.
DeepSparse Model Deployment
Examples: YOLACT, BERT; Integration: BERT
APIs Available
Use a C++ API as the interface between your application and the Neural Magic DeepSparse Engine. A simple demo with code is also provided to invoke the DeepSparse Engine using the C++ API. Once you have installed the DeepSparse Engine, you will be ready to use the C++ API and take advantage of the library libdeepsparse.
SparseML
New Transfer Learning Integrations, Recipes, and Tutorials
Neural Magic’s ML team creates sparsified models that allow anyone to plug in their data and leverage pre-sparsified models from the SparseZoo. Sparsifying involves removing redundant information from neural networks using algorithms such as pruning and quantization, among others. This sparsification process results in many benefits for deployment environments, including faster inference and smaller file sizes.
- NLP: Question Answering Use Case with BERT: This end-to-end guided experience will allow you to start from a Neural Magic pre-trained BERT model in the SparseZoo, apply a private dataset with a recipe using SparseML, and deploy on a CPU with the DeepSparse Engine.
Directly in our GitHub repo:
- YOLACT Tutorials: training integration or recipe application
- Hugging Face Transformers Training Integration and BERT: overview, installation, quick tour
- BERT - Apply a Recipe: As an alternative to the end-to-end guided question answering use case experience above, this tutorial focuses specifically on applying recipes workflows that simply the sparsification process.
- Mask Language Modeling Transfer Learning: BERT tutorial
- PyTorch Image Classification: This tutorial using PyTorch shows how Neural Magic sparse models simplify the sparsification process by offering pre-sparsified models for transfer learning onto other datasets.
For user help or questions about anything of these highlights, sign up or log in: Deep Sparse Community Discourse Forum and/or Slack. We are growing the community member by member and are happy to see you there.