Page 1 of 1

Some Barcelona(K10) details

Posted: 2007-04-20 02:29pm
by Ace Pace
DailyTech
Giuseppe Amato gives another overview on the high points of AMD's next-generation CPU architecture

This article was first published in German on K-Hardware.de.

Yesterday, AMD held a press presentation in Munich, Germany to update journalists about its upcoming K10 processor. AMD's Giuseppe Amato, Technical Director Sales and Marketing EMEA, had a few minutes to talk about the architecture at length. This architecture, previously dubbed K8L by Henri Richard -- now publically called K10 -- is scheduled to be AMD's first monolithic quad-core design.

The integrated memory controller (IMC) will get a few new features in the K10 core. When utilizing multiple memory modules, along with proper BIOS implementation and mainboard routing, the IMC can access memory in 64-bit channels (72-bit if you use ECC). This way it is possible to read and write data simultaneously, or improve efficiency for irregular access patterns which increasingly occur in a quad-core environment. This feature is available on AM2+ and F+ boards; on "old“ socket AM2 and F boards the usual 128-bit dual-channel mode is available.


Due to split power planes, the IMC can be clocked down independently of the CPU cores, along with reduced voltage. This also enables CPU overclocking without touching the memory frequency, something that may appeal to enthusiasts. These features are again dependent on Socket AM2+ and F+ platforms.

Amato explained how the quad-core design benefits from the internal crossbar switch the backbone of communication inside the K10 CPU. With Intel's current quad-core design there are cases where data needs to travel over the FSB -- in AMDs case all inter-CPU communication takes place on-die.

The crossbar switch of the K10 core is already prepared for up to 8 cores, Amato boasted. Amato wouldn't give even a vague timeframe for market availability of such a CPU, though he indicated the company is prepared for whatever the market demands. Amato made clear that octo-core is far away in the future – Shanghai will not get 8 cores.

K10 will introduce a shared L3 cache while the individual cores have dedicated L1 and L2 caches. As long as requested data lies in L1, it can be directly loaded. This also works if the data lies in the L1 cache of another core, in which case the communication works via the crossbar switch. In case requested data resides in the L2 cache, it will be loaded to L1 and then invalidated in L2 as AMD has an exclusive cache design. The L3 Cache, however, is not exclusive, but allows for a shared bit to be set. If a core loads data marked as shared, it will reside in the L3 cache and can be fetched by other cores as well.

Amato also mentioned an array of power saving measures which, in sum, allow AMD to deliver a quad-core CPU in the same thermal envelope as today’s dual-core CPUs.

K10 adds the capability of independently clocking all the CPU cores. In current K8 processors (and Intel's Core 2 generation), all cores are clocked at the same level all the time -- the P-state can only be changed synchronously. In case of a compute-intensive single-threaded process, all cores must run on the highest level P-state. On K10-based CPUs, the idle cores could be switched to the lowest P-state, while others are in different states, depending on load.

This feature could possibly be abused by overclockers to overclock a single core above the specified levels. Amato clarified that AMD doesn't endorse overclocking, but acknowledges there are people interested in that. In a warranty case, AMD could detect PLL programmings out of spec which would deny the warranty. The new cores, however, have new thermal sensors, to improve overheating protection.

Amato closed the session by mentioning Shanghai as a successor to Barcelona in the server space for 2008. Shanghai will be an improved quad-core architecture, which is supposed to be socket-compatible with current Socket F platforms. Roadmaps available to DailyTech revealed Shanghai is a 45nm quad-core CPU featuring 6MB of L3 Cache.

Posted: 2007-04-20 05:14pm
by Starglider
The big problem with the K10 is likely to be the clockspeed. The architecture looks solid in most respects, and I was looking forward to upgrading to one (or indeed several, for the company). But Intel has been storming ahead with clockspeed bumps on the Core processors and overclocking results indicate plenty of frequency headroom if they can keep the thermals down. K10 looks likely to have a marginal IPC advantage over Penryn but based on leaked reports, AMD's past performance and K8's frequency scaling issues it looks like AMD are going to be lagging so seriously on clockspeed that they'll end up a distant second. I wish they weren't, because I much prefer the K8 platform design. Intel will throw in more cache too, but that's less of an issue with AMD's on-die memory controller (I prefer the lower mainmem latency over larger cachable working set for my work anyway).

Posted: 2007-04-20 06:21pm
by phongn
Barcelona probably has a moderate FPU IPC advantage over Penryn ... but integer? I'm not so sure.

Any particular reason you like K8, Starglider? NGMA/Core is a pretty good design.

Posted: 2007-04-20 06:35pm
by Starglider
phongn wrote:Any particular reason you like K8, Starglider? NGMA/Core is a pretty good design.
Low main memory latency, cheap fast SMP, provided affordable 64-bit a long way in advance of Intel (dog knows when we'd have had affordable 64-bit if it was up to Intel alone - AMD had to force them to hack it in). Intel's 4-way machines suck and they basically can't do 8-way x86, a legacy of the whole Itanium fuck-up (it was a reasonable research project, turned out it didn't work, but they put it into production anyway due to rampant delusions - but I suppose it did frighten some of their high-end competitors into throwing in the towel prematurely).

Posted: 2007-04-21 04:22am
by Yokel on an Island
phongn wrote:Barcelona probably has a moderate FPU IPC advantage over Penryn ... but integer? I'm not so sure...
AMD's silence on the integer front says it all, really. As for the supposed FP scaling, the current speculation is that the single core perf suffers as a result clock-wise, but we'll see how it goes. If true, the 45nm node better not hit any hiccups or the consequences will be dire.

Posted: 2007-04-21 04:28am
by Yokel on an Island
Starglider wrote:Low main memory latency, cheap fast SMP, provided affordable 64-bit a long way in advance of Intel (dog knows when we'd have had affordable 64-bit if it was up to Intel alone - AMD had to force them to hack it in). Intel's 4-way machines suck and they basically can't do 8-way x86, a legacy of the whole Itanium fuck-up (it was a reasonable research project, turned out it didn't work, but they put it into production anyway due to rampant delusions - but I suppose it did frighten some of their high-end competitors into throwing in the towel prematurely).
I hate to break this to you, but Opteron scaling to 8 sockets is pretty dire as well, and the revenue from that area is close to zero, as is traditionally the case for the x86 commercial server sector, as Intel can attest. As for Itanium, it's rolling pretty well at the moment and may eclipse AMD's Opterons by revenue (if not socket count) in the next 1-2 quarters.