Description of modern processor types, and why ARM CPUs and GPGPUs are advancing faster than x86.

Last updated on 22nd Sept, 2014 by Shervin Emami.

Disclaimer: I currently work for NVIDIA that sells ARM CPUs & GPGPUs. But this is my honest personal opinion about computing.

It used to be that CPUs could be clearly defined as either RISC or CISC:

RISC CPUs were simple "reduced" processors, such as Acorn or MIPS CPUs that were very small & cheap & low-power and could either access memory or process data within CPU registers or access I/O, but couldn't do more than one of them at a time.
CISC CPUs were "complex" processors, of which Intel x86 was basically the definition of the CISC processor, because it could do many things at the same time such as load data from memory while doing calculations while doing I/O at the same time.

But these were all designed just to handle whole "integer" numbers like 5 or 6 but not 5.5. So if you wanted to use "real" floating-point numbers like 5.5, then it would use software to do the maths and thus was around 500x slower to process a number with a decimal point compared to whole numbers. So then you had FPUs (hardware floating-point units within a CPU chip), that perform various math operations on numbers that can have fractions in them, in hardware. FPUs would process floating-point numbers about 20x slower than CPUs could process whole numbers, (but that is still a lot faster than CPU software libraries that were 500x slower at processing floating-point numbers), so FPUs became used in pretty much every CPU since the 1990s.

There was also SIMD features added to CPUs like Pentium 2 onwards (ie: MMX & SSE) for multimedia processing, since it can do things like adding 4 pairs of numbers in the same time it normally takes to add 1 pair of numbers. But SIMD are hard to program for, so are rarely used except for things like inside popular video codecs were it's worth the programmer to spend lots of time making it run faster.

For specialized highly mathematical projects, there were DSPs (basically a co-processor that can do multiplication & addition operations with floating-point numbers roughly as fast as a CPU can process whole numbers, which is great, but DSPs can't really do anything else besides multiply or add numbers), that are sort of like SIMD but several times larger. So if you needed a huge amount of repeated calculations of certain maths formulas and not much else (eg: for audio processing, or modems or telecommunications), then DSPs could be used to process say 20 pairs of numbers at the same time. Just like SIMD, DSPs are a lot harder to program than regular CPUs or FPUs, but they could give much higher performance, so if it was for a big commercial telecomms project then DSPs were great but for something like a home project then DSPs were too hard & expensive to use.

So all of those types of CPUs and co-processors were very clearly defined of which were which. A CPU was either RISC (eg: ARM & microcontrollers) or CISC (ie: x86), and it might have had an FPU too, and perhaps a GPU (just for graphics), and in certain embedded projects it could use a DSP for number crunching. Intel & AMD x86 CPUs were by far the leader of CPUs, but they kept getting more & more complex and using exponentially more power draw. When it got to the Pentium 4, it used so much power & generated so much heat that it reached the limits of what Intel could reasonably do to make their CPUs faster! After bringing out faster & more complex chips every year or 2 for about 30 years, Intel suddenly hit a brick wall and couldn't make CPUs any faster because Pentium 4 was already using too much power! So Intel & AMD had to stop making faster chips and actually go back to the old Pentium 1 drawing boards and literally start selling CPUs that were slower than Pentium 4 but much lower power & heat. This started the dual-core / quad-core / multi-core craze, since Intel couldn't make their CPUs run any faster but they could combine multiple CPU cores together in parallel.

But while Intel & AMD were struggling with power & heat problems, ARM's low-power "RISC" CPUs were being used by huge numbers of embedded devices & Nokia-era mobile feature-phones, and around the year 2000 onwards, ARM started adding lots of highly advanced features to their CPUs. So suddenly you had ARM CPUs that are considered simple "RISC" CPUs but they have many features of the latest x86 CISC CPUs whilst still being very low in power draw & heat. So now the distinction of RISC vs CISC isn't so clear anymore, because ARM is considered "RISC" but it is still highly complex. Recent ARM & x86 CPUs also have very good FPU & SIMD units, so actually recent processors such as Intel Core-i7 and ARM Cortex-A15 have many DSP-like features, making the distinction between DSP vs CPU+FPU+SIMD less clear.

Meanwhile, GPU vendor NVIDIA (and eventually ATI/AMD and PowerVR) designed their GPUs to support not just 2D/3D graphics but also general-purpose maths calculations, referred to as GPGPU. In some ways, GPGPUs can be thought of as chips that contain hundreds or thousands of extremely simplified "RISC" style CPUs. GPUs (and to some extent DSPs & SIMD units) have the advantage that they can basically continue becoming wider & wider (such as 100 parallel cores or 1000 or 10,000 parallel cores, etc), getting linear increase in total performance with just linear increase in power draw & heat. So GPUs are very scalable to continuous expansion compared to CPUs. If you think 10 or 20 years into the future, it would be relatively straightforward for GPUs to have 10 million parallel cores, whereas it's unlikely that CPUs could be running at 100x faster clock rates than they are now, because power draw & heat would increase exponentially and they are already near their power & heats limits even now! Just like GPUs, it is also possible to make DSPs & SIMDs significantly wider, but they are much harder to program than GPGPUs, and are less suited to massive expansion, so GPGPUs really seem like the ideal parallel-processing platform for the future. Of course, CPUs will always be important, since only some tasks are suited to parallel-processing. But CPUs don't have an obvious way to improve in the future, whereas parallel architectures do.

Disclaimer: I currently work for NVIDIA that sells ARM CPUs & GPGPUs. But this is my honest personal opinion about computing.