A Brief History of GPUs


Graphics processing units (GPUs) were popularized in the late 1990s and early 2000s by Nvidia and ATI (now a part of AMD), but according to Wikipedia, the technology has a longer and more complex history. Tracking back to arcade system boards in the 1970s, graphics processors found their roots in computer and video games.

GPUs began to be used for general purpose computing between 2000–2010, with the introduction of parallel GPUs for applications that require complex, simultaneous calculations. Stanford University researchers Rajat Raina, Anand Madhavan, and Andrew Ng wrote a seminal paper in 2009 discussing the technology’s promise in machine learning applications. It wasn’t long before GPUs became the de facto standard for processing deep machine learning algorithms, despite their limitations (…more on those later).

Let’s take a brief look at the history of GPUs before machine learning, and their current status in machine learning applications.

Photo Credit: Jonas Svidras (via Unsplash)

Graphics Processing in Gaming

According to the Computer History Museum, the earliest computer graphics emerged in the 1950s, with the first scanned computer image in 1957 (a photo of the rotating drum scanner inventor’s son, which was named one of LIFE Magazine’s “100 Photographs that Changed the World”).

It wasn’t until the 1970s and 80s that video gaming really took off, with the golden age of arcade games, Atari, and Nintendo Entertainment System (NES). Graphics processing companies set their sights on this burgeoning industry, racing to produce 2D accelerators. In the early 1990s, the first 3D accelerators emerged on the scene, according to Wikipedia, which led to a mass-market adoption of 3D accelerators in arcade and video games including the Sony Playstation, Sega Models 1 and 2, Nintendo 64 and others.

Nvidia’s GEForce series of chips (marketed as the first GPUs) pushed gaming graphics to the next level, followed closely by ATI’s Radeon 9700, which allowed GPUs to become more flexible for other computing applications beyond gaming alone.

GPUs in Machine Learning & Current Limitations

Since the 2009 Stanford research paper was issued, GPUs have been more frequently adopted for training and inference in neural networks. These systems were seen as more effective at massively parallel processing for repeatable, identical computations (this is an important concept to understand, as they are specifically designed for SIMD instructions) in machine learning, versus their CPU counterparts that process multiple, more complex computations at the same time.

Today, the race is on between Nvidia, Intel, AMD and others to develop faster and more powerful domain specific hardware chips. And GPUs are just the start of the alphabet soup that is the AI hardware ecosystem. Google also introduced its own purpose-built hardware accelerator in 2016, the Tensor Processing Unit (TPU), for processing machine learning algorithms in TensorFlow. Other examples include Graphcore’s IPU, Apple’s Neural Engine chip in the latest iPhone X[S], AMD, Intel, Cerebras, and many more.

However, these new hardware accelerators, despite their significant advances, cannot leapfrog fast enough to achieve the basic needs of many data scientists. Data scientists have found that they need to use relatively small models, small images, or low batch sizes to fit their samples into a GPU or its brethren, much like the problem detailed in this article from data scientist Thomas Wolf. These limitations force complicated workarounds, or require access to more GPUs (which often are too costly for resource-limited organizations).

The fact of the matter is, larger batch, model, or image sizes are often required to get accurate results, yet as we detail in our Fallacy of the FLOPSblog post, accelerators often require shrinking image, model, or batch size. In some cases, data scientists won’t even bother to work with a model that is too big for a GPU (if they’re not in the mood for a time-consuming workaround).

A Post-GPU World for Neural Networks

Since advances in domain-specific hardware accelerators like GPUs are expensive endeavors with very long lead times, the door is open for a new set of challengers to emerge, which offer a different approach to the problem of processing deep neural networks (or wide, shallow ones, for that matter). These advances will have the potential to create the next big machine learning unlock for data scientists worldwide.

Note: A version of this article previously ran in our Medium publication. Follow along @LimitlessAI.


Neural Magic is powering bigger inputs, bigger models, and better predictions. The company’s software lets machine learning teams run deep learning models at GPU speeds or better on commodity CPU hardware, at a fraction of the cost. To learn more, visit www.neuralmagic.com.