Physical Register Files to Save Power

The original x86 instruction set has a very limited number of registers (8). In order to maintain backwards compatibility with legacy x86 code, the ISA and associated registers were preserved. To scale performance with wide out of order architectures however, we needed larger register files. The solution was to enable register renaming, where the hardware could have additional registers not defined in the x86 spec and rename them on the fly.

Register renaming is done in all modern day x86 processors. There are two approaches to register renaming. The current Phenom II/Opteron approach actually carries the data from renamed registers along with the instruction as it moves through queues before it gets executed. You effectively create very wide instructions, which is horribly power inefficient (moving data on a chip takes a lot of power) although it gets the job done from a performance standpoint.

The alternative is something that we don’t see used in any current generation microprocessors. Instead of carrying data along with the instructions, you simply carry pointers to the data with those instructions. There’s added management complexity but you don’t have to worry about moving lots of data around, and therefore avoid much of the power penalty.

Bobcat (as well as Bulldozer) uses physical register files to save power. Intel actually did this in the Pentium 4 but hasn’t used PRFs since. AMD argues that with power as a major driver of design, PRFs will be necessary in future architectures.

Bobcat’s Performance Expectations

With nearly the same pipeline depth as Atom (15 vs. 16 stages), nearly the same cache latencies, the same instruction issue width and presumably competitive clock speeds (~1.5GHz), Bobcat based microprocessors should inherently outperform Atom thanks to its out of order architecture.

Atom does hold an advantage in that each core is multithreaded, so heavily threaded apps may have an advantage on Intel’s architecture. That being said, by far the biggest issue we have with Atom based netbooks is their single threaded performance that contributes to an overall slow user experience. Bobcat should hopefully address that.

On the threaded side, AMD does have another solution. As I mentioned before, Bobcat won’t be used in a microprocessor by itself - Ontario will feature two of them. AMD said that future designs are expected to integrate 2 or 4 Bobcat cores, while there are no plans to produce a single core version it’s always possible.

I believe a dual core Ontario based on Bobcat, if clocked high enough, could deliver a good enough balance of single and multithreaded performance to really challenge Atom in the netbook space. The assumption is that graphics performance will be much better than Atom with Ontario integrating an AMD GPU.

AMD’s official line is that Ontario will be able to deliver 90% of the performance of a mainstream notebook in less than half the die area. AMD isn’t just looking to compete with Atom, but go after even the CULV market with Ontario. Only time will tell if the latter is over zealous.

Power Concerns

AMD calls Bobcat sub-1W capable, which seems to imply that short of a smartphone Bobcat could go anywhere Atom could go. Technically, if AMD wanted to, even getting one into a smartphone wouldn’t be impossible - it would just require a healthy investment in chipsets.

It remains to be seen how good TSMC’s 40nm process will be compared to Atom’s Intel-manufactured 45nm transistors in terms of power consumption. Presumably the out of order aspect of the design will guarantee higher power consumption than Atom, but for the netbook/CULV notebook market the added performance may be worth the added power consumption.

It’s an Out of Order Atom Bulldozer
Comments Locked

76 Comments

View All Comments

  • ROad86 - Thursday, August 26, 2010 - link

    I think without being a pc expert that amd was trying to maximaze the multi-thread perfomance in less die size and being more efficient at power consumption. But i believe that they are still developing Bulldozer in order to maximaze single thread perfomance too. In desktop not much applications are threaded well in enough so they have to be competive in single thread perfomance too. Thats why I believe they dont announce release date yet. Among side the new manufactaring procces at 32 nm and I think the waiting for the release of sandy-bridge in order to see how better are intel new processors, the release date will be probably Q4 2011. But these are just speculations.
  • Vallwesture - Thursday, August 26, 2010 - link

    It has been over two years...
  • ROad86 - Thursday, August 26, 2010 - link

    New architecture, completly new design, maybe softaware too needs too be optimazed(windows 7 for example), in the end lets hope to bring something truly amazing. On paper it does but lets wait for reviews!
  • KonradK - Thursday, August 26, 2010 - link

    "The basic building block is the Bulldozer module. AMD calls this a dual-core module because it has two independent integer cores and a single shared floating point core that can service instructions from two independent threads"

    I'm curious whether CPU shedulers can distinguish between cores located in the same module from cores located in other modules of Bulldozer .
    Because two cores located in the same module share one FPU unit , running two FPU heavy threads on two cores located in the same module and leaving cores in other modules idle would be at least unoptimal.
  • Simen1 - Tuesday, August 31, 2010 - link

    From page 6: "Aggressive prefetching usually means there’s a good amount of memory bandwidth available so I’m wondering if we’ll see Bulldozer adopt a 3 - 4 channel DDR3 memory controller in high end configurations similar to what we have today with Gulftown."

    AMD already have a 4 channel DDR3 design. Its in the Opteron 6100-line of processors on the G34 socket (LGA1974). AMD have promised it will be compatible with future bulldozer-based processors.
  • liem107 - Monday, September 6, 2010 - link

    I wonder how bobcat would fare against the VIA Nano. Considering VIA s portfolio, it would be a good aquisition for Nvidia for example to get their hands on a fairly good x86 core and license.

Log in

Don't have an account? Sign up now