The arrow of time

Ivan Voras' blog

Optimizing for Atom

Since the Atom CPU is understandably (and expectedly) slow, it's obviously interesting to attempt to find the perfect compiler optimization, the magic bullet that will bring out the best from the applications. One common advice, easily found with Google, is to treat it as if it were pentium4. This can be theoretically explained by the CPU having HTT (not that it's important for single-threaded optimizations), or that it's microinstructions are simpler, trading high clock rates for performance (semi-true), or other architectural reasons, but as it happens, it's not even close to being relevant to gcc.

The Atom is more like the original Pentium than Pentium 4. It doesn't do speculative execution and has a very simple pipeline. As it turns out, to gcc it looks more like the Pentium with bolted on HTT, MMX and SSE instructions that's fond of unrolling loops and likes aligned data.

Here are some runs of the C version of scimark2 (a floating-point math heavy benchmark - YMMV, of course) with some gcc flags (no intention of being consistent or complete):

As it turns out, gcc appears to do a very decent job with -mtune=native, and mtune=generic is more than acceptable. The biggest gains (in this math-heavy benchmark) come from using SSE for math, but even they are destroyed by tuning for pentium4.

The difference between the fastest and the slowest optimization is 21%. The impact of using march instead of mtune is negligible (not enough difference to tell if it helps or not).

(I've included k6 just for reference - I know Atom doesn't have 3dnow)

Late update: Tuning for k8 (with SSE and O3) yields a slightly higher best score of 182.

#1 Re: Optimizing for Atom

Added on 2009-03-12T19:35 by gonzo

I have a dual core atom 330 running freebsd 7.1R amd64. (I also set CPUTYPE=nocona).

 

Have you tried any optimizations on amd64 freebsd installation?

#2 Re: Optimizing for Atom

Added on 2009-03-12T21:05 by Ivan Voras

The netbook Atoms (270, 280) don't support x64.

#3 Re: Optimizing for Atom

Added on 2009-10-29T09:02 by

Acoording to processor family id, Atom is "Core2/NewerXeon". So maybe a good solution would be: -march="core2" + -mtune="native OR generic"?

#4 Re: Optimizing for Atom

Added on 2009-10-29T15:23 by Ivan Voras

No, Atom is as different from Core 2 as it can be while still being maufactured by the same company. Meaning it is *very* *very* different from Core2. By its low level characteristics it is most similar to early non-superscalar CPUs. From my results, mtune=generic works fairly well, but optimizing it as if it were Core2 would probably nuke its performance.

Post your comment here!

Your name:
Comment title:
Text:
Type "xxx" here:

Comments are subject to moderation and will be deleted if deemed inappropriate. All content is © Ivan Voras. Comments are owned by their authors... who agree to basically surrender all rights by publishing them here :)