Challenge the hegemony of Nvidia H100! IBM simulates the artificial neural network chip of the human brain, which improves the efficiency by 14 times and solves the power consumption problem of the AI model

Original source: Xinzhiyuan

Image source: Generated by Unbounded AI

Recently, IBM launched a brand new 14nm analog AI chip, which is 14 times more efficient than the leading GPU, which can make H100 worth the money.

Paper address:

Currently, the biggest obstacle in the development of generative AI is its astonishing power consumption. The resources required for AI cannot grow sustainably.

IBM, on the other hand, has been researching ways to reshape AI computing. One of their achievements is the simulated memory computing/simulated artificial intelligence method, which can reduce energy consumption by using the key features of neural networks running in biological brains.

This approach minimizes the time and effort we spend on computation.

Is Nvidia's monopoly about to be subverted?

## IBM's latest blueprint for the future of AI: Analog AI chips are 14 times more energy efficient

According to a report by foreign media Insider, Dylan Patel, chief analyst of semiconductor research company SemiAnalysis, analyzed that the daily operating cost of ChatGPT exceeded 700,000 US dollars.

ChatGPT requires a lot of computing power to generate answers based on user prompts. Most of the costs are incurred on expensive servers.

In the future, the cost of training models and operating infrastructure will only soar more and more.

IBM published in Nature that this new chip can reduce the pressure of building and operating generative AI enterprises such as Midjourney or GPT-4 by reducing energy consumption.

These analog chips are built differently than digital chips, which can manipulate analog signals and understand gradients between 0 and 1, but only for different binary signals.

Simulated Memory Computing/Simulated AI

And IBM's new approach is to simulate memory computing, or simulate AI for short. It reduces energy consumption by exploiting a key feature of neural networks operating in biological brains.

In the brains of humans and other animals, the strength (or "weight") of synapses determines the communication between neurons.

For analog AI systems, IBM stores these synaptic weights in the conductance values of nanometer-scale resistive memory devices (such as phase-change memory PCM), and uses the laws of circuits to reduce the need to constantly send data between memory and processor, perform Multiply-accumulate (MAC) operation - the main operation in DNN.

Now powering many generative AI platforms are Nvidia's H100 and A100.

However, if IBM iterates on the chip prototype and successfully pushes it to the mass market, this new chip may very well replace Nvidia as a new mainstay.

This 14nm analog AI chip can encode 35 million phase-change memory devices for each component and can simulate up to 17 million parameters.

And, the chip mimics the way the human brain works, with the microchip performing calculations directly in memory.

The chip's system can achieve efficient speech recognition and transcription, with an accuracy close to that of digital hardware.

This chip achieves about 14 times, and previous simulations show that the energy efficiency of this hardware is even 40 times to 140 times that of today's leading GPUs.

PCM crossbar array, programming and digital signal processing

This generative AI revolution has just begun. Deep Neural Networks (DNNs) have revolutionized the field of AI, gaining prominence with the development of fundamental models and generative AI.

However, running these models on traditional mathematical computing architectures limits their performance and energy efficiency.

While progress has been made in developing hardware for AI inferencing, many of these architectures physically separate memory and processing units.

This means that AI models are typically stored in discrete memory locations, and computing tasks require constant shuffling of data between memory and processing units. This process can significantly slow down calculations, limiting the maximum energy efficiency that can be achieved.

Performance characteristics of PCM devices, using phase configuration and admittance to store analog-style synaptic weights

IBM's phase-change memory (PCM)-based artificial intelligence acceleration chip gets rid of this limitation.

Phase-change memory (PCM) can realize the integration of calculation and storage, and directly perform matrix-vector multiplication in the memory, avoiding the problem of data transmission.

At the same time, IBM's analog AI chip realizes efficient artificial intelligence reasoning acceleration through hardware-level computing and storage integration, which is an important progress in this field.

Two Key Challenges of Simulating AI

In order to bring the concept of simulated AI to life, two key challenges need to be overcome:

  1. The computational precision of the memory array must be comparable to that of existing digital systems

  2. The memory array can seamlessly interface with other digital computing units and the digital communication structure on the analog artificial intelligence chip

IBM makes the phase-change memory-based artificial intelligence accelerator chip at its technology center in Albany Nano.

The chip consists of 64 analog memory computing cores, and each core contains 256×256 cross-strip synaptic units.

And, integrated into each chip is a compact time-based analog-to-digital converter for converting between the analog and digital worlds.

The lightweight digital processing unit in the chip can also perform simple nonlinear neuron activation functions and scaling operations.

Each core can be thought of as a tile that can perform matrix-vector multiplication and other operations associated with a layer (such as a convolutional layer) of a deep neural network (DNN) model.

The weight matrix is encoded into the simulated conductance value of the PCM device and stored on-chip.

A global digital processing unit is integrated in the middle of the core array of the chip to implement some more complex operations than matrix-vector multiplication, which is critical for certain types of neural network (such as LSTM) execution.

Digital communication paths are integrated on-chip between all cores and global digital processing units for data transfer between cores and between cores and global units.

a: electronic design automation snapshot and chip micrograph, you can see 64 cores and 5616 pads

b: Schematic diagram of the different components of the chip, including 64 cores, 8 global digital processing units, and data links between cores

c: Structure of a single PCM-based in-memory computing core

d: The structure of the global digital processing unit for LSTM related calculations

Using the chip, IBM conducted a comprehensive study on the computational accuracy of analog memory computing and achieved an accuracy of 92.81% on the CIFAR-10 image dataset.

a: ResNet-9 network structure for CIFAR-10

b: the way to map this network onto the chip

c: hardware-implemented CIFAR-10 test accuracy

This is the highest accuracy reported so far for a chip using similar technology.

IBM also seamlessly combines analog in-memory computing with multiple digital processing units and digital communication structures.

The chip's 8-bit input-output matrix multiplication has a unit area throughput of 400 GOPS/mm2, which is more than 15 times higher than previous multi-core memory computing chips based on resistive memory, while achieving considerable energy efficiency.

In the character prediction task and image annotation generation task, IBM compared the results measured on the hardware with other methods, and demonstrated the network structure, weight programming and measurement results of related tasks running on the simulated AI chip.

LSTM measurements for character prediction

LSTM Network Measurements for Image Annotation Generation

weight programming process

**Nvidia's moat is bottomless? **

Is Nvidia's monopoly so easy to break?

Naveen Rao is a neuroscience-turned-tech entrepreneur who tried to compete with Nvidia, the world's leading artificial intelligence maker.

"Everyone is developing on Nvidia," Rao said. "If you want to launch new hardware, you have to catch up and compete with Nvidia."

Rao worked on chips designed to replace Nvidia’s GPUs at a start-up acquired by Intel, but after leaving Intel, he used Nvidia’s chips in MosaicML, a software startup he led.

Rao said that Nvidia not only opened up a huge gap with other products on the chip, but also achieved differentiation outside the chip by creating a large community of AI programmers ——

AI programmers have been using the company's technology to innovate.

For more than a decade, Nvidia has built an almost unassailable lead in producing chips that can perform complex AI tasks such as image, facial and speech recognition, as well as generate text for chatbots such as ChatGPT.

The once-industry upstart was able to achieve dominance in AI chipmaking because it recognized trends in AI early on, custom-built chips for those tasks, and developed critical software that facilitated AI development.

Since then, Nvidia co-founder and CEO Jensen Huang has been raising the bar for Nvidia.

This makes Nvidia a one-stop supplier for AI development.

While Google, Amazon, Meta, IBM and others also make AI chips, Nvidia currently accounts for more than 70% of AI chip sales, according to research firm Omdia.

In June of this year, Nvidia's market value exceeded $1 trillion, making it the world's most valuable chip maker.

"Customers will wait 18 months to buy Nvidia systems instead of buying off-the-shelf chips from startups or other competitors. It's incredible," FuturumGroup analysts said.

NVIDIA, reshaping computing methods

Jensen Huang co-founded Nvidia in 1993, making chips that render images in video games. Standard microprocessors at the time were good at performing complex calculations in sequence, but Nvidia produced GPUs that could handle multiple simple tasks simultaneously.

In 2006, Jensen Huang took the process a step further. He released a software technology called CUDA that helps GPUs be programmed for new tasks, transforming GPUs from single-purpose chips into more general-purpose chips that can take on other jobs in fields like physics and chemistry simulations.

In 2012, researchers used GPUs to achieve human-like accuracy in tasks such as identifying cats in images, a major breakthrough and a precursor to recent developments such as generating images from text cues.

The effort, which Nvidia estimates cost more than $30 billion over a decade, makes Nvidia more than just a parts supplier. In addition to collaborating with top scientists and start-ups, the company has assembled a team that is directly involved in AI activities such as creating and training language models.

In addition, the needs of practitioners led Nvidia to develop multiple layers of key software beyond CUDA, which also included libraries of hundreds of lines of pre-built code.

On the hardware side, Nvidia has earned a reputation for consistently delivering faster chips every two or three years. In 2017, Nvidia began tuning GPUs to handle specific AI calculations.

Last September, Nvidia announced it was producing a new chip called the H100, which had been improved to handle so-called Transformer operations. Such calculations are proving to be the basis of services such as ChatGPT, which Huang called generative artificial intelligence’s “iPhone moment.”

Today, unless the products of other manufacturers can form a positive competition with Nvidia's GPU, it is possible to break the current monopoly of Nvidia on AI computing power.

Is it possible for IBM's analog AI chip?

References:

View Original
  • Reward
  • Comment
  • Share
Comment
No comments