Accelerate Customer Review Classification with Sparse Transformers


Classify Even Longer Customer Reviews Using Sparsity with DeepSparse

Customer review classification is crucial for customer-facing enterprises across industries such as retail, entertainment, food, and beverage. Knowing what your customers say about your product or solution can help you quickly address negative customer reviews and in turn reduce churn, providing a better customer experience. Implementing an AI-driven customer review initiative will inadvertently have an impact on the company's bottom line. You can develop an intelligent system with NLP models that automatically assign positive or negative sentiment to reviews from customers so that customer issues are addressed immediately. 

The ability to quickly classify sentiment from customers is an added advantage for any business. Therefore, whichever solution you deploy for classifying the customer reviews should deliver results in the shortest time possible. By being fast the solution will process more volume, hence cheaper computational resources are utilized.  Deploying a deep learning model to tackle this problem is one solution. For this solution, decreasing the model’s latency and increasing its throughput is critical. This is where sparsity comes in. 

The advantages of using sparse models include:

  • Optimizing the model’s size and speed without affecting its accuracy.
  • Ease of model deployment as sparse models take less space. 
  • Reduced cost of inference by using commodity CPUs instead of expensive accelerators.

In this article, you will learn how to use a sparsified BERT-Base model to classify long customer reviews using the Amazon polarity dataset. Sparsification will eliminate 90% of the BERT-Base model weights, making it smaller and faster. We’ll show you how to deploy and execute the sparse model using DeepSparse. DeepSparse executes sparse models faster, enabling you to save resources while delivering accurate results. All the code for this blog can be found in this Google Colab notebook

Customer Review Classification Using the Amazon Polarity Dataset

Large language models are widely used for text classification. However, the downside is that these models require that the sentences are of a certain length. For example, BERT requires a maximum length of 512 tokens, while Longformer and Big Bird use sparse self-attention but can only process up to 4096 tokens. This poses a challenge because data in the real world can be longer, like in customer reviews.

Various approaches have been proposed to solve the challenge of classifying long documents. They include: 

  • Truncating long documents to the first 512 tokens. 
  • Using sparse self-attention instead of full self-attention to process long documents. 
  • Dividing long documents into smaller chunks.
  • Choosing sentences that are more important in classifying the document.

All the aforementioned solutions require us to make compromises and build unnecessarily complex ML pipelines.

Models that process longer tokens may be more complex and require more training resources. According to Efficient Classification of Long Documents Using Transformers, complex models that can process longer tokens are outperformed by simpler models such as the BERT baseline models. Therefore, the additional complexity of these models can be avoided by using simpler models that process shorter sentences. 

Based on the above, it can be concluded that more complex language models don’t necessarily outperform smaller ones in terms of accuracy. This means that using models that accept up to 512 tokens is a valid solution to the problem of long document classification. It can also be concluded that the first 512 tokens contain critical information in the classification of a document. 

Models that accept longer tokens take more time during training and inference. Since there is no significant advantage to using such models, the additional overhead can be avoided by using simpler models. Hence, you can use models that process up to 512 tokens for real-world document classification without worrying about performance loss. 

Document Classification with DeepSparse

Pre-trained language models are very accurate but are large and computationally expensive. This challenge makes it difficult to deploy these models in the real world. Large language models can be made smaller and more efficient, which may come with reduced accuracy. Enter sparsification, a technique that prunes and quantizes large models while preserving accuracy. Sparsification reduces model size and accelerates inference while maintaining model accuracy. 

You can deploy sparse models on CPUs and get GPU-class performance using the DeepSparse. CPUs are cheaper, more commonly available, and require less power consumption. Deploying models on the DeepSparse provides

  • Low CPU latency.
  • Enormous CPU throughput.

Long Sentence Classification with oBERT

Let’s start by demonstrating how to perform inference with DeepSparse using a dense oBERT-Base Uncased model. This model is available for free from SparseZoo. First, install DeepSparse:

pip install deepsparse

You can use the DeepSparse classification pipeline to make inferences on documents immediately. This is useful when your task doesn’t require fine-tuning. 

from deepsparse import Pipeline

classification_pipeline = Pipeline.create(

inference = classification_pipeline(
    ["That movie was very long, but worth every minute"]

In other cases, you may want to fine-tune a language model on your data to improve performance, for instance, increase the throughput. The above model achieves a throughput of 12.7 items per second on 23 cores. 


Let’s now sparsify the oBERT-Base dense model using SparseML and compare the throughput with the one obtained above. 

Transfer a Sparsified Model for Document Classification

The oBERT-Base uncased 80% pruned model was fine-tuned on the IMDB dataset. The pruned model achieves 93.26% accuracy on the test set. 

Knowledge Distillation in Document Classification

Knowledge distillation enables a small model––student––to be taught by a larger trained network––teacher. Use the oBERT-base masked language model from the SparseZoo as the teacher and distill it into the sparsified obert-base document classification student model. The SparseZoo contains other pre-sparsified NLP models that can be used as students. 

To use the oBERT-Base uncased as a distill teacher, you need to perform to the following: 

  • Train the dense oBERT-Base model on the Amazon polarity dataset. 
  • Distill the oBERT-Base model into a sparsified oBERT-Base student model. 
  • Install SparseML to start training.
pip install sparseml[torch]

The script below trains the oBERT-Base teacher model on the Amazon polarity dataset. The model will be stored in the output_dir. Other parameters defined in this training include: 

  • The path to the model. In this case, it’s a dense oBERT-Base model from the SparseZoo. 
  • 512 as the maximum sequence length. 
  • The recipe to define the training hyperparameters. This can be a SparseZoo stab or a local file. 
  • A training batch size of 16. Reduce the batch size if you get CUDA memory errors. 
  • The dataset name is the name of the dataset used for training. Since SparseML is built on top of Hugging Face, you can define any Hugging Face name. Alternatively, you can define a custom dataset using a custom file and pass it using the train_file argument.
sparseml.transformers.text_classification \
    --output_dir models/teacher \
"zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none" \
"zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-text_classification" \
    --recipe_args '{"init_lr":0.00003}' \
    --dataset_name amazon_polarity \
    --max_seq_length 512 \
    --per_device_train_batch_size 16 --per_device_eval_batch_size 16 \
    --validation_ratio 0.1 \
    --do_train --do_eval --evaluation_strategy epoch --fp16  \
    --save_strategy epoch --save_total_limit 1

The script below will distill the teacher oBERT-base model onto the  pruned-quantized oBERT-base student model. The distill_teacher argument directs SparseML to perform model distillation from the teacher model you just trained. When training is complete, the sparsified model will be stored in the output_dir folder. 

sparseml.transformers.train.text_classification \
  --output_dir sparse_bert_document_classification \
"zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --distill_teacher models/teacher \
"zoo:nlp/document_classification/obert-base/pytorch/huggingface/imdb/pruned90_quant-none \
  --dataset_name amazon_polarity \
  --do_train \
  --do_eval \
  --validation_ratio 0.1 \
  --fp16 \
  --evaluation_strategy steps \
  --eval_steps 100 \
  --logging_steps 100 \
  --logging_first_step \
  --per_device_train_batch_size 8 \
  --per_device_eval_batch_size 32 \
  --gradient_accumulation_steps 6 \

Evaluate Sparse Model

Run the following script to evaluate the performance of the student model on the entire test set.

sparseml.transformers.train.text_classification \
  --model_name_or_path sparse_bert_document_classification \
  --dataset_name amazon_polarity \
  --max_seq_length 512 \
  --per_device_eval_batch_size 32 \
  --preprocessing_num_workers 6 \
  --do_eval \
  --eval_on_test \
  --output_dir obert_test

Deploy Trained Sparse Model

To deploy the model, you first need to export it for inference. The sparseml.transformers.export_onnx function exports the training graph to a performant one for inference. The function creates an ONNX file you can deploy on the DeepSparse

#Enter full path for the model_path
sparseml.transformers.export_onnx \
    --model_path sparse_bert_document_classification \
    --task 'text-classification'  \
    --sequence_length 512

Next, create a DeepSparse document classification pipeline using the exported model. The pipeline expects the type of task and the path to the model. 

from deepsparse import Pipeline

classification_pipeline = Pipeline.create(
    task= "text_classification",
    model_path= "sparse_bert-document_classification",
inference = classification_pipeline(
    [ "When American musician Saul Williams and actor-writer Anisia Uzeyman came up
with the concept for their directorial debut, Neptune Frost, they considered using
several mediums, including a studio album, a graphic novel, and a stage musical. 
It's wrong to say they settled on a film, because Neptune Frost feels like an 
amalgamation of all of the above. A powerful treatise on the destruction of 
constructed binaries, the pair are iconoclastic in their approach: they take what 
exists and reshuffle it into something original and fluid. The songs in Neptune 
Frost look beyond our world to an imagined one, to outer space where Earthly 
binaries do not exist. Some of the music feels reminiscent of the radically 
transgressive cult queer musical Hedwig And The Angry Inch (2001), similarly 
smashing gendered images together to create something new and beautiful. The 
Afrofuturist aesthetic is sensational to watch, peppering the darkness with 
fluorescent paints and found objects to create something alien but still 
identifiable as human. Yet each time we come back to reality, something has shifted 
in our perception."]

Benchmark DeepSparse Sparsified Model 

DeepSparse ships with tools for benchmarking inference. The tools accept a path to a model or a SparseZoo stub. Benchmarking can be done using the CLI tool or Python API.

Let’s evaluate the student model on DeepSparse using the Python API. 

from sparsezoo import Model
from deepsparse import compile_model

batch_size = 16
folder = "sparse_bert_document_classification"

# Download model and compile as optimized executable for your machine
model = Model(source = folder)
engine = compile_model(model.onnx_model.path, batch_size=batch_size)

inputs = sample_data[sample_inputs"]
# Runs a benchmark
benchmarks = engine.benchmark(inputs)

From the above output, you can see that the model achieved 61.1 items per second throughput on 23 cores, a 4.8X increase in performance over the dense oBERT baseline.

Benchmarking can also be done on the DeepSparse CLI. 

deepsparse.benchmark sparse_bert_document_classification/model.onnx

Final Thoughts 

Classifying long text sequences is an important task, especially in the real world, where sentences can be arbitrarily long. You have learned that you don’t have to use complex models to perform document classification. More specifically, you have covered: 

  • How to classify long text sequences using DeepSparse on sparsified models.
  • Transferring a sparsified model for document classification. 
  • Deploying a trained sparse model using DeepSparse.

If you have any questions or comments, please reach out to our community Slack channel or submit a PR on GitHub.