How to Use Machine Learning in Visual Search for Retail

|
Photo by Szabo Viktor on Unsplash

Simply defined, machine learning-based visual search uses images rather than text to deliver results. In retail, for example, consumers can snap a photo of a jacket or purse they see in the real world, and use these photos to locate and purchase the item online. 

According to a recent Forbes article, the popularity of visual search is on the rise, particularly for millennials. Sixty-two percent of them want to use visual search over any other technology for shopping. And Gartner research states that 30 percent of all searches will be completely queryless (focused on discovery rather than search) by 2020.

This post will give an overview of typical use cases for visual search, as well as model types, frameworks, and challenges data scientists must consider.

Use Cases for Visual Search

Although some critics say that the use of computer vision in visual search is relatively nascent, interesting new use cases are emerging every day with retailers, e-commerce marketplaces, and social media platforms. Here are three examples:

  • Determine which brand or item is pictured in social media: Some social media platforms, like Pinterest, were early adopters to the visual search trend. The app, Pinterest Lens, has been helping users identify and shop items within Pinterest posts since 2017.
  • Discover similar items to a known image: In perhaps the most popular use of visual search, some ecommerce marketplaces such as Wayfair use it to aid customers that may not know exactly what they’re looking for. For example, if a customer sees a chair at a friend’s house that they like, they can simply snap a photo and Wayfair’s algorithms will pull up similar items for purchase.
  • Find the exact item from a specific brand: Some brands, such as Tommy Hilfiger, have used visual search technology to help customers shop the company’s own runway styles instantly. With the Hilfiger app, users can take photos of models and instantly get a visual wish-list of items for purchase.

Under the hood of visual search implementations, convolutional neural networks (CNNs) are at work, searching image databases to return the most relevant results. Let’s take a look at how.

Machine Learning Models, Batch and Image Sizes for Visual Search

Convolutional neural networks (CNNs) are a type of neural network most often used for image recognition and classification. CNNs excel at these tasks because they are designed to automatically learn how to recognize spatial hierarchies in an image. Once these algorithms are trained, they can ‘infer’ the next best prediction for the task at hand.

For visual search use cases in retail, batch-size-one is typical. During a visual search query, a CNN will get a small vector out of an existing image and run it against a database of images. Image size, on the other hand, is correlated to the architecture you’re using, So, if you train with large image sizes, you’ll want to run larger image sizes in production. Typically, the larger the image size, the better the accuracy.

GPUs vs. CPUs in Visual Search

In batch-size-one scenario, you’ll pay a big memory movement performance hit for moving data over to a GPU. At batch-size-one, you won’t saturate all of the cores in a GPU, and therefore won’t get the maximum performance benefits from the excess computing power. That’s why many real-time visual search implementations are done on CPUs. 

Using a typical data architecture on a CPU, you’ll get a result for a search in two to three seconds - interesting, but much too slow for most websites. High latency can result in higher website bounce rates if consumers become impatient while waiting for results. For example, an Akamai study demonstrated that every 100-millisecond delay in website load time decreases conversion rates by 7%. To compromise, reducing image size can help lower latency (but at the price of accuracy).

How Neural Magic Processes Visual Search Models, with better performance and accuracy

Luckily there’s a new way to process visual search models on a CPU at lightning fast speeds. Neural Magic is redefining machine learning performance on a CPU, making it simpler for companies to realize the benefits of machine learning on hardware they already own, but are not utilizing to its full potential. CPUs using Neural Magic can process search queries on a CNN in 100-200 milliseconds, compared with two to three seconds on a CPU without Neural Magic.

For example, using Neural Magic, one large online retailer generated a 6x speedup with no change in deployment infrastructure for their real-time ranking and product recommendation use case.

The Neural Magic Inference Engine provides state-of-the-art performance for deep learning, without sacrificing accuracy. It automatically simplifies your trained model with Neural Magic Libraries and Calibration UX. It then runs it through the NMIE engine for speedup and deployment at scale.

See our GitHub repo to sparsify and quantize models for faster CPU inference.