Four Machine Learning Trends for Recommendation Systems

November 5, 2019

Recommendation systems (or recommender systems) were designed to understand and predict user preferences based on user behavior. 

In industries like e-commerce and retail, with an ever-present need to establish a deep understanding of consumer behavior, recommendation systems have become a crucial way to interact with users in real-time to create fine-tuned personalizations.

Here’s a simple example: estimating the response of a user for new product items based on historical information stored in the system, and suggesting to this user items for which the predicted response is high.

In fact, most consumers interact with recommendation systems every day, for example, on retail marketplace sites like Amazon or streaming platforms like Netflix.  Todd Yellin, Netflix’s Vice President of Product Innovation provided context around the opportunity in a recent interview with MobileSyrup: “With more than 1,000 hours of just original content coming to this year and over 104 million subscribers around the world, it is important to surface the right content to the right person at the right time. Our personalization enables us to create more than 250 million tailored experiences to delight each user every single time they enter the platform.” 

There are four key machine learning trends we will explore that illustrate how these systems have evolved over the last few years and enabled more pervasive use of real-time personalization and recommendation predictions.

1: Use of content and collaborative filtering

The initial recommendation systems employed two types of machine learning algorithms to generate predictions: content filtering and subsequently collaborative filtering. 

Content-based filtering works from the baseline of an existing user profile and generates new user attributes based on inputs. For example, a new subscriber to a monthly clothing service may indicate her style preferences, colors, and sizes at the beginning of the membership, and these details serve as the foundation for her monthly recommendations. The site’s algorithm may compare similar items in the inventory to those that are highly rated by the user in order to deliver future recommendations. 

To contrast, collaborative filtering generates recommendations based on similar users, or activity around similar groups of items. For example, if a shopper buys a beach towel and beach chair, he may get recommendations for a beach umbrella or tent based on similar usage buying patterns. Or, if similar users also bought things like stand-up paddle boards and ocean kayaks, he may get recommendations for further items that may seem slightly outside his pattern of shopping. 

These two techniques then opened the way for simpler, more efficient approaches, like neighborhood-based collaborative recommendations, which interweaved both user-based and product-based correlations.

2: Shift from statistical modeling to deep learning-based modeling

Predictive systems have evolved as emerging technology and techniques have simplified how to build, deploy, and manage models that generate predictions. 

A great deal of predictive analytics stemmed from a form of statistical modeling that was aimed at inferring the relationship between variables. Often, classifications of the variables and data were made to predict the probability of an event happening, such as the propensity to buy a particular clothing item based on recent purchases. Linear and logistic regression models were two of the most common models used to generate this class of predictions. 

Recommendation systems, in particular, require the ability to process vast amounts of categorical data, or variables that contain label values rather than numeric values. 

For example, an online t-shirt company may recommend shirts to a recurring visitor based on their profile data, previous purchases, as well as click-through rate data. By assigning a “color” variable with the values “red,” “green”, and “blue,” they could generate the likelihood of color the returning user would be more inclined to purchase based on the above data. 

The biggest reason for the shift to deep learning-based modeling?

Performance, especially in real-time latency sensitive workloads. Additionally, these networks allow for interactive learning between the features, which generate more accurate correlations and subsequent predictions for the business task at hand. This is because the deep learning architecture creates three distinct benefits over their traditional counterparts: 

  1. Better representation of the underlying model, algorithm, and their encodings across the network layers (e.g. three-layer feedforward neural network as a representation)
  2. Better evaluations via fine tuned functions that determine bounds and limits more accurately (e.g. loss function) 
  3. Better optimizations by learning what representations yield more successful evaluations

Lastly, because deep neural networks are able to scale effectively with large data sets, they become an ideal choice for recommendation systems that need to classify large amounts of categorical data such as product skews, location, age, sex, etc. 

The most common deep learning network for this type of processing is the Multilayer Perceptron (MLP). It invokes a prediction function that is composed of layering a sequence of fully connected layers and an activation function to capture the most complex interactions of data at hand.

3: Deep Learning Recommendation Models (DLRM)

Given the first two trends and growing need for recommendation systems, Facebook built and open-sourced DLRM: an advanced, open source deep learning recommendation model. 

The model allows users to benchmark:

  • The speed at which the model (and associated operators) performs
  • How various numerical techniques affect network accuracy

Implementers of recommendation systems now have: 

  • An architecture to improve correlations between features and attributes to generate better predictions 
  • A way to test performance and accuracy of their production systems

Moreover, DLRM is available for both training and inference and is designed to work with public data sets. A sample data set is provided containing both continuous and categorical features to show the full benefits of the MLP network architecture described earlier.

4: Improving recommendation performance with Neural Magic

Neural Magic is redefining machine learning performance on a CPU, making it simpler for companies to realize the benefits of machine learning (such as the ones we outlined previously in the post) on broadly accessible, commodity hardware, which — in many cases — they already own.

Today, when clients run recommendation networks on CPUs they are faced with three trade-offs to achieve desired performance:

  1. Reduce model size 
  2. Reduce batch size 
  3. Reduce accuracy 

Moreover, all three sacrifices can also impact the quality of the predictions being made, creating undesirable business outcomes.

Neural Magic addresses these challenges by generating GPU-class performance on a CPU by:

  1. Reducing the amount of computation required
  2. Accelerating memory-bound processes
  3. Running larger models with larger inputs to improve predictions and accuracy

As an example, using Neural Magic, a large online retailer was able to generate a 6x speedup on their production CPU architecture, with no change in deployment infrastructure for their real-time ranking and product recommendation use case. Prior to using Neural Magic, the online retailer had to reduce batch size by 10X and model size by 40%, resulting in a 10+% drop in  accuracy, in order to meet their production performance requirements. By taking advantage of Neural Magic’s Inference Engine, they can now achieve their performance targets, run larger models, and maintain accuracy needed for their business outcomes. 

To learn more about deploying Neural Magic in your production environment, sign up for early access today.

Sign Up for Early Access