if you are at Siggraph, there’s a good chance you will be attending Intel’s presentation today on Larrabee. If you’re not at Siggraph there’s a good chance that you’ll still be covering it or have maybe been briefed already. With so much information and opinion currently out there, we felt it appropriate to offer a viewpoint on what has already been claimed and in the spirit of furthering the discussion, we’d also like to throw out some questions that we think everyone would benefit from having answered.
A lot of the current press releases and statements focus on instruction sets and "new languages" as the solution to parallel computing. Intel claims the X86 instruction set makes parallel computing easier to accomplish but as any HPC developer will tell you, this hasn’t proven true with multi‐core CPUs as applications struggle to scale from 2 to 4 cores. Now with even more cores, this same technology is claimed to solve parallel computing ‐ we’d like to know what changed. After all, if it’ll be easy to program 32 cores with 16‐wide SIMD, why aren’t more developers using quad cores with 4‐wide SIMD? And if Ct is the answer, then why not use it on their CPUs?
The real challenge in parallel computing, in our opinion, lies in a different place. Developers have to decide how to divide a problem in parallel and then design software to use a parallel processor. GPUs have been used to solve one class of parallel computing ‐ graphics processing ‐ with a highly successful architecture. As graphics evolved, developers now write very sophisticated programs to do everything from graphics processing to physics within the standard graphics pipeline.
The next evolution of the GPU took place in 2006. The computer architecture group at NVIDIA added instruction sets and new architectural concepts to the GPU to make the computing architecture even more general. We call this the CUDA computing architecture.
CUDA is a C‐language compiler that is based on the PathScale C compiler. This open source compiler was originally developed for the X86 architecture. The NVIDIA computing architecture was specifically designed to support the C language ‐ like any other processor architecture. Comments that the GPU is only partially programmable are incorrect ‐ all the processors in the NVIDIA GPU are programmable in the C language. Given this, why is Intel calling the CUDA C‐compiler a "new language"?
Intel claims that the X86 base of Larrabee makes it seamless for developers. But with conflicting statements coming from Intel themselves on whether or not there will be a new programming model or not, there are several important questions.
‐ Will apps written for today’s Intel CPUs run unmodified on Larrabee?
‐ Will apps written for Larrabee run unmodified on today’s Intel multi‐core CPUs?
‐ The SIMD part of Larrabee is different from Intel’s CPUs ‐ so won’t that create compatibility problems?
NVIDIA’s approach to parallel computing has already proven to scale from 8 to 240 GPU cores. This allows the developer to write an application once and run across multiple platforms. Developers now have the choice to write only for the GPU or write and compile for the multi‐CPU as well. In fact, NVIDIA demonstrated CUDA for both GPU and CPU at our annual financial analyst day and ran an astrophysics simulation on an 8‐core GPU inside a chipset, a G80‐class GPU and a quad core CPU. Exactly the same binary program was used for the range of GPUs. And exactly the same source code for the CPU and GPU.
CUDA has been described as "hard for developers". Visit www.nvidia.com/cuda for a sampling of applications written in the C language for the GPU across a great number of fields. Virtually all of these applications were written by developers without any assistance from NVIDIA. They just downloaded the compiler, documentation and examples. Since CUDA runs across all NVIDIA GPUs introduced by NVIDIA over the last 2 years or so ‐ developing a parallel application is simple and inexpensive. Plus with an installed base of more than 90 million C‐language enabled GPUs, developers can already target a large base of consumer, workstation and now HPC customers.
To date, Intel has not described Larrabee’s development environment. While focusing on one aspect of the architecture ‐ the X86 instruction set ‐ any differences or new challenges on top of the existing problems with multi‐threading have yet to be revealed. With a new SSE architecture, new software layers to manage threads, perhaps another new language with Ct ‐ developers are not simply using the X86 instruction set‐‐they need to learn the way around a different computing architecture. Parallel computing problems are not solved with device level instruction sets, these problems are solved in computing languages with a computing architecture that is quick to learn and easy to use.
Computing on the GPU now has a critical mass of languages, developers and interest from the leading providers of operating systems.
Other Things that make us go hmmmm………
Intel has spent a lot of energy telling the world that the GPU is dying or that it is not a growing market ‐ why then are they investing so heavily on Larrabee and talking so much about it? Larrabee, like NVIDIA GPUs, has enormous floating point processing power. Wouldn’t this encourage all the supercomputing clusters in the world to adopt GPU/Larrabee‐style architectures and subsequently hurt Intel’s CPU market?
Larrabee is positioning itself as a great GPU. Yet, users and developers alike have expressed frustration and disappointment with their IGP technology for many years. Why hasn’t Intel used some of their investment and expertise to fix some of these problems for their 200M+ IGP customers? Also, will they be able to achieve the fine balance between both power and cost in graphics?
Ray Tracing (RT). NVIDIA’s CUDA‐enabled GPUs can do raytracing, and more. Even if Intel can do raytracing on Larrabee, how would developers achieve that on the hundreds of millions of Intel IGPs shipped each year? Is Intel abandoning its IGP customers?
In summary, Intel knows that moving to powerful, floating point rich parallel architectures is the future ‐ in so doing they will inevitably encourage more developers to develop on GPUs as they too will see this move from Intel as a major industry shift and will want to target the hardware where their software has the greatest chance of success. NVIDIA will have shipped over 150 million CUDA capable parallel processors by the time Larrabee ships and Intel knows they will hurt their CPU business by making this transition, but this is truly the era of visual computing and this shift is a necessary move.