Ponte Vecchio: The Old Bridge in the land of Gelato

With the disclosures about Xe-HPC and the new compute unit and memory fabric in mind, Intel went onto the big money announcement: the first Xe-HPC GPU will be a design called Ponte Vecchio (PVC).

If that looks like a weird GPU to you, then I agree. What we have are two segments, each with eight compute chiplets, a number of XeMF chips, and HBM memory. I do want to step back a second to talk about some of the technologies around PVC before going into what we see here.

Earlier in 2019, I was offered a chance to speak to Intel’s Ramune Nagisetty. She’s in charge of Intel’s plans regarding chiplet and connectivity technologies: she’s the person to speak to when it comes to connecting chiplets with technologies like the Embedded Multi-Die Interconnect Bridge (EMIB), but also when it comes to new stacking technologies like Foveros. We’ve already seen EMIB in a number of products, such as Intel’s FPGA family, and consumer hardware like Kaby Lake-G, and Foveros is getting its first limelight in the new Lakefield chip for small-form factor client notebooks, which stacks a 14nm IO die with a 10nm Compute Die, all with a DRAM PoP on top.

In that discussion, it was confirmed what Raja had said at the Architecture Day a few months prior: both EMIB and Foveros will be coming to Intel’s new family of GPUs. Of course, EMIB and Foveros doesn’t come cheap: trying to align vias between silicon is best done when sober and with a clear head, and even then there are always optimizations to be made. In that regard, we assumed EMIB and Foveros would be going on the high-end discrete graphics solutions, what we now know to be Xe-HP. But it would appear that these technologies are likely to be in Xe-HPC as a priority, and Ponte Vecchio seems to be the first chip that will have both.

It is worth noting that technologies like EMIB and Foveros are going to define Intel’s future. Not only in GPUs, but also CPUs, FPGAs, AI, and everything else.

This slide shows that both technologies will be on Ponte Vecchio, but this slide actually says something a lot more than what it appears.

One of the questions with EMIB and Foveros is in what form it will take. There are many possibilities for both, and this slide might have exposed a few. Note that this slide only shows half of a Ponte Vecchio, and I want to do a custom diagram for you so show what I’m seeing here.

Here I’ve expanded out to a full PVC, with 16 Xe-HPC chiplets, 8 XeMF chips, and each half of the PVC has access to four HBM stacks. Each of the HBM stacks is connected via EMIB, along with what I think is an IO chip (perhaps built on 14nm).

From the official slide above, you will notice the group of 8 Xe-HPC chiplets and four XeMF chips seem to be on a separate package altogether, with a unified BGA onto another tier (the mid-tier) of packaging below. To be honest, that slide also looks like there are some chips on that mid package, but on this mid-package we also get the EMIB connections and what I believe is an XeMF scaling chip for off-GPU connections. The HBM2 and what IO chip seem to be on a bottom tier of packaging, which means that we have three tiers of packaging for this individual GPU: four if you include the Foveros on each of the Xe-HPC chiplets!

Now normally when we think of tiered packaging like this, we expect it to be BGA on BGA. There could be a part of this design that uses an interposer. If Intel was making this at TSMC, I might use the word ‘CoWoS’ as part of the stacking technology they use. At the minute it’s all a lot of guesswork. Assuming the diagram is accurate.

I’ve labelled each of the Xe-HPC chiplets as having Foveros, as we believe these to be two die stacked. A lot of commentary around this stacking is asking the question of what exactly are on each of the two dies that are stacked – is one IO and the other compute, like in Lakefield, or is it something else? A semi-reasonable conclusion is that one die might be cache, and the other die might be compute. However it’s important to note that GPUs tend to have local cache right next to the compute, to in actual fact I expect both of these dies in the Foveros stack to both be compute dies. It may be such that one is for SIMD and the other is for SIMT, but if the two were identical and modular then it makes it much easier to manufacture.

For those of you wondering what those metal brackets are around each of the Xe-HPC dies, those are likely to be shims to help correct for different chip heights. I should point out that the die inside the ship, the stacked Xe-HPC die, is actually smaller in the diagram than the XeMF die between each pair of chiplets. This could be for all the cache on the XeMF dies, or it could be if they are built on a different process node.

Also, it is worth noting that with Ponte Vecchio, we have two ‘slices’. From the diagram, it looks like that each slice has additional XeMF connection points, which could be used to go off-GPU, or so more slices (up to four) could be connected together. Some of that we might address when we look at the Aurora system and Intel’s proposed layout of one of the servers.

Intel promises a couple of other PVC features. Not only will the hardware support ECC and parity checks across all memory and caches, but there will be Xeon-class RAS features and something called ‘in-field repair’. More details to come.

Also, a final word on the name: Ponte Vecchio. Intel has plenty of Lakes, Coves, Rapids, and other things. Apparently having a codename that involves a physical place actually helps address any potential legal issues without having to do due diligence for a proper marketing name. Raja Koduri said that he chose it because it’s based in Florence, which has good gelato. He likes gelato, and should Intel decide to do a launch event with Ponte Vecchio in mind, Raja gets his gelato. I went to Florence a couple of years ago – the gelato was good.

Xe: The Flavors and The Architecture oneAPI: Intel’s Solution to Software
Comments Locked

47 Comments

View All Comments

  • Spunjji - Friday, December 27, 2019 - link

    They were even worse on the notebook side of things. They were happy to sling dual-core + HT CPUs as "i7" processors until AMD announced the 2500U / 2700U; suddenly Intel came up with a "new" 15W Kaby Lake R CPU that looked suspiciously like a TDP-limited Kaby Lake, which itself was just a voltage-tweaked Skylake on an improved 14nm process.

    Any interested hobbyists can observe the extent of the truth in this by taking a notebook with a 45W quad-core Skylake CPU, undervolting it by 100-125mV (the vast majority will do this) and dropping in a TDP limit. The barely perceptible change in performance that results is truly something to behold. My own tweaked 6700HQ averages a 22W TDP under load just from the undervolt.
  • extide - Monday, December 30, 2019 - link

    Yeah just described what a mobile CPU is, why are you surprised?
  • JayNor - Wednesday, January 1, 2020 - link

    I think if you asked Intel, they'd say ADAS, FPGAs and Optane are still very exciting programs for them, and probably already making money on ADAS and FPGAs.

    Intel shipped 88 million LTE modems for iphones this year. How many LTE modems did AMD deliver?
  • Korguz - Thursday, January 2, 2020 - link

    sources ????
  • Spunjji - Friday, December 27, 2019 - link

    Depends whether you're focusing entirely on the CPU side of their business or making a more general assessment. Generally speaking, they've burned money on all sorts of unsuccessful projects (see: Atom cores in phones and tablets, their 4G modem projects).

    Even on the CPU side they had their fair share of struggles prior to the 10nm-induced disasters. Ivy Bridge was a weak and somewhat cheapened follow-up to the absolute blockbuster that was Sandy, while Broadwell arrived late and barely showed its face at all on the desktop due to early struggles with 14nm. These things didn't have larger effects because AMD and their foundry competitors were performing so terribly at the time, but they were missteps nonetheless.
  • Spunjji - Friday, December 27, 2019 - link

    Agreed re: worryingly light on technical details. I'm sure it all feels very real to the engineers working on it, but from an end-user perspective this is still very much a marketing exercise.
  • Sychonut - Wednesday, December 25, 2019 - link

    Raja "The Wood Elf" Koduri better not let his mouth write a check his ass can't cash.
  • Spunjji - Friday, December 27, 2019 - link

    Those who observed the Vega launch know he has no such compunctions :D
  • GreenReaper - Thursday, December 26, 2019 - link

    Hmm. That HPC block diagram . . . where have I seen it before . . .
    https://www.resetera.com/threads/how-would-a-ryzen...
  • peevee - Monday, December 30, 2019 - link

    Can it deal with compressed representations of sparse matrices?

Log in

Don't have an account? Sign up now