[CS-FSLUG] cpu speed comparison

Fri Oct 28 14:01:06 CDT 2005

On 10/28/05, Frank Bax <fbax at sympatico.ca> wrote:
> cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz ("GenuineIntel" 686-class) 2.81 GHz
> cpu0: AMD Athlon(tm) XP 2600+ ("AuthenticAMD" 686-class) 1.91 GHz
>
> I ran my script on each system three times and averaged run times.  I'm not
> sure if this calculation is valid, but adjusting runtime for cpu speed:
>         Intel   -> 122.5 secs * 2.81 Ghz        = 344.2
>         Athlon  -> 150.5 secs * 1.91 Ghz        = 287.5
>
> With 2.81G clock cycles per second, Intel used 344 billion clock cycles to
> do the work, Athlon did the work in 287 billion lock cycles.  Does that
> mean Athlon accomplished the task more efficiently?  Is my logic flawed?

Well, it's possible that other bits of the system can account for some
of the different (ide controller, hard drive, ram, CPU cache size,
etc.)

However, the answer to your question - did the Athlon accomplish the
task more efficiently, the answer to that is almost definitely yes. 
I'm trying to think back to some of my CPU design courses - but
basically the Pentium 4 does a much higher than "normal" amount of
what is known as pipelining (pipelining allows the CPU's clockspeed to
be cranked up)

Basically, what pipelining means, is that I can start executing one
instruction per CPU clock cycle, but that this instruction may not
actually finish executing until several cycles later, and only then
(or at the end of the pipeline - I can't remember), can the result of
the instruction be inspected.  Anways, this means that when there's
any branching possibility (and if statement or the like), this means
that the CPU does not know for a while how the test turned out.  Thus,
modern CPUs like the P4 have what are known as branch prediction
engines which essentially try to guess what the result of the branch
instruction will be, based on previous times that its seen this bit of
code.  if it guesses incorrectly, then at the end of the pipeline, it
essentially has to discard all the instructions that it started
executing since then, and go back and do the correct ones.

Anyways, the above is part of the reason for the P4's apparent
slowness, but there's more to it.  CPUs can have multiple execution
pipelines - eg. some for integer operations, and some for floating
point operations - and the numbers of each of these can affect their
suitability to particular types of applications.  In addition, there
are features such as MMX, SSE2, ... (I don't really keep track of
them) and I'm not sure whether they support entirely the same ones. 
The Altivec units on PowerPC chips, for example, can give quite a
speed boost to calculations involving vectors and matrices.  I'm also
not sure about the number of registers - places to store stuff in the
CPU - and whether their numbers are the same for both the Athlon and
the P4, but that might also affect things as it takes time to load
stuff from memory.

It also depends on what architecture the program was compiled for, and
how the program was optimized.

It's been a few years since I've had to deal with any of this stuff,
so my memory of this is a bit rusty.  Hopefully it's given you a rough
idea of how CPUs these days work and some things, other than
clockspeed, that differentiate one CPU from another other.  If you're
interested in even more information, perhaps the following articles
will clear things up somewhat:
http://www.hardwaresecrets.com/article/209 (CPUs in general)
http://www.hardwaresecrets.com/article/235 (the P4 specifically - a
followup to the first article)

David