Hardware for Artificial Intelligence

Burkert, Andreas

doi:10.1007/s38314-019-0026-4

Hardware for Artificial Intelligence

In the Spotlight
Published: 13 March 2019

Volume 14, pages 8–13, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

ATZelectronics worldwide Aims and scope

Hardware for Artificial Intelligence

Download PDF

Andreas Burkert¹

205 Accesses
Explore all metrics

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automated driving systems at or above level 3 depend on applications developed in the area of artificial intelligence. The sustainable, real-world use of such applications depends, in turn, on exceptionally fast and energy-efficient computing platforms that include novel hardware solutions. Various enterprises are currently vying to develop chips capable of unprecedented processing speeds. So far, the so-called tensor processing units seem to hold the most promise.

Semiconductors for Driverless Vehicles

The current benchmark is 19 kDMIPS. Performance values short of this benchmark are insufficient for automated driving at or above level 2 premium (or L2+). An MIPS value indicates how many millions of instructions per second a computing system is capable of processing. A DMIPS is an MIPS value ascertained using the Dhrystone benchmark. However, the 19 kDMIPS benchmark is only the beginning, and car manufacturers will soon need computing systems capable of more than 270 kDMIPS for level-3 and -4 automation; according to Rob Csongor, Vice President of Autonomous Machines at Nvidia. In an interview with ATZelectronics [1], Csongor explains that the sheer complexity of the numeracy problem of autonomous driving can only be managed using high-performance, energy-efficient computing systems that "deploy various types of processors." In this connection, Csongor refers in particular to Graphics Processing Units (GPUs) and deep-learning accelerators that need to be used in addition to Central Processing Units (CPUs). GPUs are also preferred when it comes to applications with Artificial Intelligence (AI). For good reason.

GPUs Replace Classical CPUs

While CPUs are configured for maximum flexibility and sequential processing, GPUs come with architectures that are highly parallel. This does not only predestine GPUs for graphic tasks in which the same operation is executed thousands of times in parallel. Indeed, the parallel architecture is better suited to applications in the area of AI, which often involves the execution of multiplications of very large matrices.

The volumes of data that are necessary for the training of properly functioning deep-learning systems are so extensive that using GPUs "has proven to be the most cost-effective and energy-efficient way of training neuronal networks," says Csongor. This leaves him convinced that GPUs will soon replace conventional CPUs when it comes to autonomous driving at level 3 to level 5. Likely aware of the important market ramifications of this, Csongor's boss, Jen Hsun Huang, took the step of presenting the Rapids Open-Source GPU Acceleration Platform at the GPU Technology Conference GTC 2018 that took place last November in Munich (Germany).

Energy-efficient Vector-matrix Multiplications

According to Huang, the platform will enable both the high quality and rapid processing that is necessary to support complex autonomous-driving functions. This is why Huang instructed his team to equip the platform with the latest GPU generation. This optimized GPU is capable of much higher performance than conventional CPUs, with internal benchmarks from GPU developer Nvidia indicating a 50-fold acceleration compared to CPU-only systems. CPU computing has thereby reached its limits, and AI demands that developers take a new approach to hardware.

In the interview, Matthias Zöhrer, Founder of the Graz-based startup Evolve, offers a succinct specifications profile for semiconductors that are to support AI applications. There are essentially two prerequisites for a general-purpose AI chip. "On the one hand, it needs to be very fast at executing very many parallel vector-matrix multiplications. On the other hand, the chip needs to be able to do this in a very energy-efficient manner." As Zöhrer points out, however, the main challenge is that this sort of affine transformation or mathematical formulation represents the most resource-intensive computing operation for current deep-learning systems.

The Era of the Intelligent Algorithm on a Chip

Zöhrer has spent years researching AI systems. One of his aims is to improve cockpit communication with the help of deep learning. Working together with his employees, Zöhrer has developed a kind of hearing aid that relays verbal commands to a smartphone or voice assistant in the cockpit using an innovative acoustic channel separator to filter out vehicle noise. This was accomplished by integrating AI applications on a single small chip. A first step towards hardware-based AI.

This is not at all a strange proposition for Zhang Zhenning, Vice President of Marketing at ThinkForce, a Shanghai-based startup founded in 2017 that specializes in semiconductor chips for AI. In a statement, Zhang refers to the AI era as "the era of the intelligent algorithm on a chip." [2] Is hardware-based AI set to succeed software-driven AI? Zöhrer cautions against such speculations: Given the current boom in AI research, we can expect "the half-life for the deployed processes and algorithms to be very short."

Software is Still the Deciding Factor

A look at the research landscape supports Zöhrer's assertion. The release of new models and benchmarks is almost a daily occurrence. According to Zöhrer, this continuous change underscores the need for universal hardware architectures. As it stands, general-purpose hardware architectures (processors, GPUs) and special hardware semiconductors are currently being used to parallelize or accelerate simple mathematical operations. This enables one to simulate a wide variety of AI software models.

It follows that the decision made by an AI system is still based in the software and not in the hardware. However, Zöhrer points out that once AI-software processes have been refined to the point of satisfying all human specifications, "they can then be integrated into hardware for reasons of energy efficiency and cost effectiveness." This essentially gives rise to a separate chip for AI. With its first Asic called Novu-Tensor, the Silicon-Valley-based, AI-chip startup NovuMind has presented an example of a hardware accelerator for AI applications. The 15-teraflop chip, which can perform an impressive 15 ∙ 10¹² floating-point operations per second, requires less than 5 W.

Deep-learning Accelerator

However, Zöhner reminds us that even if the market for deep-learning accelerators for edge devices is lucrative and very competitive, such developments remain rare, not least on account of the need for ever greater amounts of energy and computing speeds. While Zöhner regards the basic strategy as sensible for hearing-aid and smartphone applications, he hastens to add that automobile manufacturers need ultrafast and extremely power-hungry systems to accommodate autonomous driving functions. At the 2019 CES in Las Vegas (USA), this requirement was met by ZF with its 600-Tera-OPS ProAI RoboThink AI platform introduced there.

According to a company representative, the ProAI RoboThink AI platform accommodates a full range of automotive requirements. In particular, the control box is capable of appropriately processing internal and external sensor data, analyzing cloud-based input and managing V2X communication - and all of that in real time. Performance of this sort should indeed suffice for safe driverless vehicles at and beyond level 4.

AI Computing Requires New Approaches

A close look at ZF's computing platform shows that it implements the same Nvidia Turing GPU architecture that supports Nvidia's Xavier. The latter is a System-on-Chip (SoC), also developed by Nvidia and used to create a level 2+ automated driving system for the 2019 CES. Various AI applications are integrated in the Drive AutoPilot system, which is expected to enable the serial production of driverless vehicles in 2020, a topic we reported on in connection with the GTC 2018 [3].

Despite these advances in performance, however, the question arises as to the extent to which existing GPU-based approaches will be capable of accommodating the future market for AI semiconductors. Every overhead in the chip design makes itself noticed in the form of non-essential, performance- diminishing intermediate steps, higher energy demand and greater chip size. Given that four connected units of the scalable, modular RoboThink are necessary to enable the ZF platform's 600-Tera-OPS performance, the power factor increases to up to 1000 W.

Of Asics and Tensors

In the past, similar requirement leaps tended to usher in changes in computer architecture. For instance, the performance and energy demands of the Pentium 4 in the middle of the last decade led the chip giant Intel to conclude that it would make more sense to combine multiple cores of a simpler architecture on a single die that to stubbornly scale up to 5 GHz. The architecture change that leveraged the introduction of the smartphone proceeded even more rapidly and sustainably, with energy-saving ARM architectures being used exclusively, while x86-based architectures of the sort offered by Intel and AMD turned out to be too inefficient for mobile deployment.

The major tech corporations and the startups have so far taken different approaches to developing new architectures. Google's Tensor Processing Unit (TPU) is essentially an existing Google architecture that has been optimized for interference and training. From a technological point of view, the TPU is based on Application-specific Integrated Circuits (Asics) assembled in TPU pods. Clemens Wasner, CEO at EnliteAi, expands upon this assessment, "In addition to the TPU architecture itself, the fast data exchange between the TPUs or TPU pods is absolutely essential, which is why so much attention is paid to data-transfer capacity." This leaves one wondering what a more advanced solution might look like.

More Radical Approach

An initial answer is provided by the British semiconductor company Graphcore, which is pursuing a more radical approach in the form of Intelligence Processing Units (IPUs). A look at the data sheet shows that the architecture was conceived from the outset in a manner that would ensure a capacity to store and train the entire model on the chip itself - a task for which the British company's Colossus chip is ideally suited. Pointing out the more than 1216 processors on a chip the size of a postage stamp, it is claimed to be ideal for AI applications. In this regard, the British company is a special kind of startup.

Graphcore's two founders Simon Knowles and Nigel Toon are both over 50 years old, and both were handsomely remunerated when they sold the semiconductor company Icera to Nvidia. They also have a famous promoter in Herman Hauser, the AI specialist and cofounder of ARM. In his position as sponsor of promising AI companies, he supported the Graphcore project. According to Hauser, "it is only the third time in computer history that we're confronted by a need for new microprocessors." [4] The ARM architecture represents the beginning of high-performance, low-power chips, followed by GPUs for high-intensity video processing. Now, with the IPU in the starting blocks, we've transitioned to the third generation. According to experts, the first IPU generation promises additional speed by a factor of 100 compared to today's GPUs.

References

[1]
Burkert, A.: "Graphics processors are replacing CPUs in automated vehicles". Interview with Rob Csongor. In: ATZelectronics worldwide 1-2/2019, pp. 22-24
[2]
ThinkForce: ThinkForce bringt 450 Mio. RMB auf. Online: www.presseportal.de/pm/129091/3820114, access: January 11, 2019
[3]
Burkert, A.: Digital erscheint die automobile Welt realistischer. Online: https://www.springerprofessional.de/en/link/16190758, access: January 11, 2019
[4]
Murgia, M.: UK start-up Graphcore aims to dominate AI chip industry. Online: https://www.ft.com/content/1a0ed6c8-18ee-11e9-b93e-f4351a53f1c3, access: January 17, 2019

2 Questions for …

ATZelectronics _ What properties does a semiconductor need to have to meet the many challenges associated with artificial intelligence?

Zöhrer _ In my view, a universally deployable artificial-intelligence chip should be able to master two tasks. On the one hand, it needs to be very fast at executing very many parallel vector-matrix multiplications. On the other hand, it needs to be able to do this in a very energy-efficient manner. This type of affine transformation is currently very resource intensive and, for the time being, can only be calculated efficiently by GPUs.

Is that why carmakers are availing themselves of the expertise of GPU manufacturers?

Carmakers are currently very dependent on large software companies with AI expertise, as well as on hardware suppliers. It is important to keep in mind that the current means of simulating AI software are limited to established architectures. While GPUs were originally developed for graphic display, they are now being deployed for a lucrative off-label use. But there is a lot of steam behind the development of optimized, general-purpose AI architectures. Examples of novel architectures include the TPUs developed by Graphcore and Google.

What Do We Think?

"There is simply too much information that a driverless vehicle needs to be able to process in real time. As the continuing race for the fastest computing platform shows, many underestimated the difficultly of the task. Instead of CPUs, many turn to an off-label use of GPUs to meet a new lucrative market demand. However, the industry is responding and pursuing some rather radical approaches. It appears that Graphcore and Google are following the right path with their tensor processing units."

Author information

Authors and Affiliations

München, Germany
Andreas Burkert

Authors

Andreas Burkert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Burkert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burkert, A. Hardware for Artificial Intelligence. ATZ Electron Worldw 14, 8–13 (2019). https://doi.org/10.1007/s38314-019-0026-4

Download citation

Published: 13 March 2019
Issue Date: March 2019
DOI: https://doi.org/10.1007/s38314-019-0026-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hardware for Artificial Intelligence

Semiconductors for Driverless Vehicles

GPUs Replace Classical CPUs

Energy-efficient Vector-matrix Multiplications

The Era of the Intelligent Algorithm on a Chip

Software is Still the Deciding Factor

Deep-learning Accelerator

AI Computing Requires New Approaches

Of Asics and Tensors

More Radical Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation