Jan 04, 2021
Author(s)
Release 0.1.0 for the Community!
February 4, 2021
As of February 2021, our products have been renamed, most have been open sourced and their release notes can be be found in GitHub!
- Sparsify
- SparseML (formerly Neural Magic ML Tooling)
- SparseZoo (formerly Neural Magic Model Repo)
- DeepSparse Engine (formerly Neural Magic Inference Engine)
Release 1.4.0
January 4, 2021
Neural Magic Inference Engine
New Features:
- Benchmarking convenience APIs added to neuralmagic.Model class: benchmark, benchmark_batched.
- Quantized depthwise convolutions supported.
- Optimized support for linear Resize and older Resize versions available.
- Sparse kernel Stacked GEMM supported.
- Simultaneous requests to a single model supported.
- Optional timestamp added to logging messages.
Changes:
- Performance optimized for Resizes at small batch sizes.
- Stacked GEMM optimized.
- Function neuralmagic.benchmark_model deprecated and renamed to neuralmagic.analyze_model (intended to be used for model and performance profiling).
- Sparse activation convolutions enabled in more scenarios.
- Default logging level switched from error to warn.
- JSON library used by the engine at the C++ level updated.
Resolved Issues:
- Compilation errors for certain models no longer present when graph optimizations are disabled.
- Assertion error fixed when running Winograd or FFT algorithms on large models.
- Correctness bug with YOLOv3 pooling operators fixed.
- Compilation failure resolved when encountering a Reshape with multiple outputs to FullyConnected operators.
- Correctness for convolutions followed by horizontal adds addressed.
- Python GIL releases while compiling and executing so other Python threads can run.
Known Issues:
- None
neuralmagicML Tooling
New Features:
- PyTorch 1.7 support
- PyTorch quantization-aware training
- PyTorch quantized model export to ONNX
- PyTorch AMP training support
- PyTorch SetWeightDecay modifier
- Keras exporter for ONNX
- ONNX support for sparse tensors
- ORT benchmarking provider support (e.g., GPU)
Changes:
- References and associated code migrated from neuralmagic.benchmark_model to analyze_model.
- Improvements added for performance of ONNX quantized graphs export.
Resolved Issues:
- None
Known Issues:
- None
Neural Magic Model Repo
- None
Release 1.3.0
Neural Magic Inference Engine
New Features:
- Asymmetric quantization support is implemented for quantized convolutions. Support is added for asymmetric quantization of weights. (Asymmetric quantization of activations has been supported.)
- Support is added for Gelu operation.
Changes:
- Performance improvements made for:
- Quantized ResNet-50 for AVX-512 VNNI architectures for batch size < 16.
- Floating point ResNet-50 for AVX2 architectures for batch size < 16. ***
- Optimized implementations made for Softmax and Reduce operations
- The engine accepts uint8, int8, int16, int32, float, int64 and double inputs. (The engine used to reject anything other than float, uint8, and int64.) Operation support remains unchanged.
- Network compilation time improved.
Resolved Issues:
- While running some networks with asymmetric quantized activations, certain large input sizes caused an assertion failure to users.
- Race condition conflicts on software license validation addressed.
Known Issues:
- None
neuralmagicML Tooling
New Features:
- Alpha version of Sparsify is released. Sparsify can improve model performance for deployment at scale using the latest model compression techniques with a visual interface. (Sparsify is not compatible with Ubuntu 16.04 with Python 3.5.)
- Jupyter Notebook for object detection models is available. Experience a single journey, end-to-end demonstration from installation to benchmarking.
- Model definitions and training support is added for PyTorch: YOLOv3, Darknet, and SSDLite.
- OpenVINO support is added for benchmarking.
- Object detection sensitivity scripts for pruning and learning rate are provided.
Changes:
- The TensorFlow classification train script tightly integrates with the TensorFlow Estimator API.
- Moving average for metrics collection is added.
Resolved Issues:
- ONNX Model Analysis now reports None for any unknown shapes (input or output) in the ONNX graph (used by ModelAnalyzer class under the neuralmagicML.onnx package and all scripts that use it, including Sparsify and the pruning config generation CLI).
Known Issues:
- Installing neuralmagicML on an AWS EC2 machine with RHEL 8 may generate an error. Adding --use-feature=2020-resolver to the pip install command will clear the error.
- PyTorch 1.7 is unsupported. Official support will be in Release 1.4.
Neural Magic Model Repo
Performant model additions:- ResNet-50-SSD-300 (VOC, COCO)
- MobileNetV2-SSDLite (VOC, COCO)
- YOLOv3 (COCO)
Release 1.2.0
Neural Magic Inference Engine
New Features:
- ONNX operators are supported for:
- 3D MatMul with constant 2D weights
- ReduceMax, ReduceMin, and ReduceMean
- Pow and Sqrt
- MaxPool and AvgPool with ceil_mode enabled
- Compile-time bias fusion is implemented for 3D MatMul.
- Initial support for quantization targeting convolutional neural networks implemented.
- ONNX operators QuantizeLinear, DequantizeLinear, and QLinearConv are now supported.
- Quantized convolutions with block-sparse kernels are supported for batch size greater than or equal to 16.
- Asymmetric quantization is supported for activations, while symmetric quantization is supported for weights.
- Ubuntu 20.04 with Python 3.8.2 is supported in the NMIE.
- Neural Magic software licensing is enabled to manage activations.
Changes:
- ONNX opset support has been updated to version 12.
- Improved error messages when unsupported types are passed to the engine.
Resolved Issues:
- Performance is improved for the ONNX operators Add, Sub, Mul, and Div.
- The ONNX operator Split with negative axis is supported.
- A potential wrong answer/crash is fixed in some networks with shuffles.
- When an ONNX model fails to load, a catchable Python exception is provided.
Known Issues:
- Ubuntu 20.04 users:
The end-to-end Jupyter Notebook examples will not include the TensorFlow Notebook. The neuralmagicML package only supports TensorFlow 1.x, which cannot be installed on Python 3.8 that comes with Ubuntu 20.04. To use the TensorFlow Notebook, use Ubuntu 18.04. - The Neural Magic license requires certain libraries that are provided with the Network Security Services (NSS) package. NSS comes with most operating systems; but, before installing Neural Magic on certain operating systems, you may need to manually install some libraries. For example:
- Ubuntu -- Install libnss3 library
- CentOS/RHEL/AmazonLinux -- Install nss or nss-3 library
- The Neural Magic Python package relaxed the ONNX version requirement to 1.5.0 or later to make it compatible with many of the ML packages. However, on Ubuntu 20.04 with Python 3.8, if you use an ONNX version earlier than 1.7, the serving feature may display an error, such as:
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "onnx-ml.proto"
In this case, the serving feature may not work properly and you will need to install ONNX 1.7.0.
neuralmagicML Tooling
New Features:
- ONNX layerwise FLOPS calculations for model analyzer are provided.
- ONNX post-training quantization for models is supported.
- nmML server is added in anticipation of the Neural Magic Sparsify release.
- PyTorch single-shot detector (SSD) is added with example implementations for training and pruning as well as support for COCO and VOC datasets.
- PyTorch 1.6 compatibility is supported.
- TensorFlow estimator support is added for modifiers.
- Training and pruning example implementation is provided for TensorFlow image classification and TensorFlow object detection API.
Changes:
- To prevent unintentional overwriting of versions, neuralmagicML package no longer installs PyTorch and TensorFlow. Users will need to install PyTorch and TensorFlow separately from the neuralmagicML package.
- More thorough tests were added to the ONNX test suite.
- ONNX one shot sensitivity analysis was changed to write pruned masks in place in the graph instead of copying to a temporary model file.
Resolved Issues:
- ONNX DataLoader was fixed to deal with loading data from globs and non-float32 input types.
- Downloaded sample data can be untarred even if the Model Repo save directory is not specified.
- ONNX KL Divergence values between 0 and min_value are now properly clipped to min_value.
- ONNX check_load_* functions are fixed for sensitivity tests that were treating an input file path as an iterator instead of loading the file.
- ONNX SensitivityModelInfo loads the correct type for load_json.
Known Issues:
- Pre-trained models from the Neural Magic Model Repo are not loading correctly into the classification_train.py script for TensorFlow.
Neural Magic Model Repo
Performant model additions:- ResNet-18 recal on ImageNet
- ResNet-34 recal on ImageNet
- PyTorch SSD ResNet-50 on VOC and COCO
Summary Highlights, Release 1.1.1 Hotfix
The following release notes include information about Neural Magic software 1.1 (build). Several defects discovered in Neural Magic ML Tooling were addressed, as follows, and new features were added to enhance the user experience.
Neural Magic ML Tooling
New Features:
- Float, int, and bool types are supported for neuralmagicML.onnx.DataLoader; previously only float32 inputs for ONNX models were available.
- Dynamically changing batch sizes is available for ONNX models running in ONNXRuntime through neuralmagicML.onnx.ORTModelRunner class and scripts/onnx/model_benchmark.py script.
- Shape detection for nodes in ONNX models using neuralmagicML.onnx.recal ModelAnalyzer is supported.
Changes:
- ONNX opset 11 was upgraded for PyTorch export for PyTorch > 1.3.0.
Resolved Issues:
- neuralmagic benchmark_model returns the correct canonical names with code updates made to neuralmagicML and dependent code referencing benchmark_model: neuralmagicML.onnx.utils and correct_nm_benchmark_model_node_ids.
- approx_ks_loss_sensitivity in neuralmagicML.pytorch.recal and neuralmagicML.tensorflow.recal properly serializes to JSON.
- approx_ks_loss_sensitivity for neuralmagicML.onnx.recal, neuralmagicML.tensorflow.recal, and neuralmagicML.pytorch.recal works as expected for layers with few parameters.
- ONNX one-shot analysis no longer throws an improper error for sparse models/models containing layers with few parameters.
- Model download and code documentation scripts properly reference "base" for baseline models instead of “dense.”
- All models with input sizes outside of 224 (Inception-v3) work as expected for the transfer learning notebooks.
- neuralmagicML.onnx.recal, neuralmagicML.tensorflow.recal, and neuralmagicML.pytorch.recal ScheduledModifierManager classes now correctly serializes to a yaml file creating a parsable file for later loading.
Summary Highlights, Release 1.1.0
Neural Magic Inference Engine (NMIE)
- Support for the AVX2 instruction set and tested AMD AVX2 chipsets
- Ability to run U-Net convolutional network for image segmentation
- Addition of model performance diagnostics mode during runtime execution
- SPLIT operator support
- Simplified packaging to improve user journey from evaluation to test
- Improved open-source TF2ONNX converter in support of newer TensorFlow versions
Neural Magic ML Tooling
- Pruning command-line interface (CLI) for ease of use and rapid prototyping
- Transfer learning CLI for ease-of-use and rapid prototyping
- Pruning for Success best practices guide and getting started documentation
- ONNX API for model and pruning analysis, as well as model conversions
- PyTorch API improvements for pruning and transfer learning analysis
- New notebook experience to demo installation and benchmarking process
Neural Magic Model Repo
Performant model additions:- Inception-v3
- ResNet-101 v1
- ResNet-152 v1
- VGG-11
- VGG-19
###
Neural Magic Inference Engine
New Features:
- neuralmagic.benchmark_model() includes individual iteration run times.
- Running the engine with numactl utility is supported.
- Split operator is supported.
- Most benchmark layers include “canonical_name” detail so the layers can be related to the ONNX model.
- Benchmark results include kernel_sparsity detail on a per-layer basis to provide confirmation of what layers the NMIE is recognizing as sparse.
- You can impose kernel sparsity on a per-layer basis by specifying layers and sparsity in a `.sparse` file.
- Resize operator is supported as an optimization for U-Net networks.
- The diagnostic graph includes information for debugging.
Changes:
- NMIE installation was optimized for a streamlined user experience.
- Reduced benchmark setup time.
- JIT improvements for AVX2 and AVX-512 instruction sets produce 5 - 20% increased performance results for certain networks.
- Performance improvements were made for AVX2 support.
- Support was improved for AMD chips resulting in better performance.
- ONNX Runtime was upgraded to version 1.3.
- Environment variable name WAND_DUMP_ORT_DOT was changed to NM_DUMP_ORT_DOT to be consistent with the naming scheme.
- User-visible environment variables used for logging were changed to use the NM_ prefix instead of WAND_.
- Some models that use convolution and pooling layers have improved performance.
- Imposed kernel sparsity applies to magnitude pruning on original weights instead of generating random sparse weights.
- Imposed kernel sparsity directly relates to sparsity; e.g., KS of 0.85 means 85% of the weights are set to 0.
- Binding threads has been disabled by default; users can enable it as needed using NM_BIND_THREADS_TO_CORES=1.
Resolved Issues:
- Thread pinning behaves as expected.
- When running with NM_BIND_THREADS_TO_CORES=1, better/more consistent performance is produced.
- NMIE no longer interferes with the way ONNX Runtime (ORT) handles denormal data in edge cases.
- Correctness on models with average pooling operators is applied to explicitly padded input.
- An out-of-bounds error is no longer present with imposed kernel sparsity pass.
- ORT overhead/subgraphs are properly timed during benchmarking.
- If parts of a model were running in ORT outside NMIE, our benchmark output would only report numbers for layers we ran, causing the total time of the benchmarked execution to be incorrect. Timing for ORT overhead/subgraphs during benchmarking is taken into consideration.
- Reshape bug was addressed where NMIE would fall back to ORT if the batch size was greater than 1 for ResNet-50 FPN SSD; the model compiles and execution is no longer slow.
Known Issues:
- None
neuralmagicML Tooling
New Features:
- Jupyter Notebooks are available to provide single journey, end-to-end experience demonstration from installation to benchmarking for the Neural Magic Repo and PyTorch and Tensorflow models.
- The neuralmagicML.onnx package is available with ONNX API support for:
- Model analysis (e.g., for sparsity, node names, or attributes)
- Generic data loader to create random data or load from numpy files
- Model runner implementations for ONNX Runtime and Neural Magic
- Loss and performance pruning sensitivity analysis
- Ability to group layers/parameters in an ONNX model based on pruning sensitivity analysis
- Ability to create pruning sensitivity tables
- Generic helper APIs to analyze ONNX files and working with NumPy arrays
- Scripts support models converted to or trained in the ONNX framework:
- classification_validation.py - Run an image classification model over a selected dataset to measure validation metrics.
- model_analysis.py - Analyze a model to parse it into relevant information for each node/operation in the graph such as parameter counts, flops, is prunable, etc.
- model_benchmark.py - Benchmark the inference speed for a model in either the NMIE or ONNX Runtime.
- model_download.py - Download a model from the NM Model Repo. model_kernel_sparsity.py - Measure the sparsity of the weight parameters across a model (the result of pruning).
- model_pruning_config.py - Create a config.yaml file or a pruning information table to guide the creation of a config.yaml file for pruning a given model in the Python package.
- model_pruning_perf_sensitivity.py - Calculate the sensitivity for each prunable layer in a model towards the loss, where, for example, a higher score means the layer affects the loss more and therefore should be pruned less.
- model_pruning_perf_sensitivity.py - Calculate the sensitivity for each prunable layer in a model towards the performance, where, for example, a higher score means the layer did not give as much net speedup for pruning and therefore should be pruned less.
- PyTorch Inception-v3 model is supported in ML Tooling.
- Pre-trained, performant PyTorch models are included with the Model Repo:
- Inception-v3: recal, recal-perf
- ResNet-101 v1: recal-perf
- ResNet-152 v1: recal-perf
- VGG-11: recal-perf
- VGG-19: recal-perf
- Additional scripts support PyTorch:
- classification_export.py - Export an image classification model to a standard structure including an ONNX format, sample inputs, sample outputs, and sample labels.
- classification_lr_sensitivity.py - Calculate the learning rate sensitivity for an image classification model as compared with the loss. A higher sensitivity means a higher loss impact.
- classification_pruning_loss_sensitivity.py - Calculate the sensitivity for each prunable layer in an image classification model towards the loss, where, for example, a higher score means the layer affects the loss more and therefore should be pruned less.
- classification_train.py - Train an image classification model using a config.yaml file to modify the training process such as for pruning or sparse transfer learning.
- model_download.py - Download a model from NM Model Repo.
- Additional script support for TensorFlow:
- model_download.py - Download a model from NM Model Repo.
- Testing is supported for PyTorch 1.5.
- PyTorch improvements to API include:
- ImageFolderDataset class added for training over any classification dataset that matches ImageNet folder structure.
- Ability to create sparse transfer learning config from PyTorch model added.
- CIFAR-10 and CIFAR-100 dataset classes were added to API for TensorFlow.
- Structured pruning (blocks, channels, filters), regex matching of parameters, and the ability to pass in any mask function is supported in APIs for PyTorch and TensorFlow Gradual KS modifier classes.
Changes:
- Demo Jupyter Notebooks on Image Classification, Object Detection, and Image Classification Serving were retired in favor of new end-to-end demo notebooks.
- NM ML Tooling installation is optimized for better user experience.
- The default for pruning notebooks uses global magnitude approximation instead of one-shot for sensitivity analysis.
- tf2onnx was upgraded to the latest version and to use the tf2onnx GitHub repository.
- Enhanced performant PyTorch models were added to Model Repo:
- MobileNetV1 - recal, recal-perf
- ResNet-50 v1 - recal, recal-perf
- VGG-16 - recal-perf
- PyTorch improvements to API include:
- ModelRegistry class was simplified and upgraded.
- ScheduledOptimizer class was simplified to remove the need to call epoch_start, epoch_end in order to prevent more complicated use cases with the PyTorch optimizer.
- ModuleTester and ModuleTrainer classes were enhanced for usability.
- LR and KS modifier classes were enhanced for usability.
- PyTorch and TensorFlow Gradual KS modifier classes in API include:
- In order to prune, specifying parameters over specifying layers was introduced.
- There is enhanced support for multi-use layers/parameters that are reused in networks such as with feature pyramid networks (FPN).
Resolved Issues:
- None
ONNX Runtime (ORT) Server
New Features:
- None
Changes:
- None
Resolved Issues:
- When running multiple engines, bias accumulation optimization outputs correct data.
- The server no longer creates more worker threads than there are physical cores.
- Running the ORT Server from a Python script no longer hangs and produces readiness feedback to the user.
Known Issues:
- None