Analyzing Intel’s Discrete Xe-HPC Graphics Disclosure: Ponte Vecchio, Rambo Cache, and Gelato

Name: Analyzing Intel’s Discrete Xe-HPC Graphics Disclosure: Ponte Vecchio, Rambo Cache, and Gelato
Item: Analyzing Intel’s Discrete Xe-HPC Graphics Disclosure: Ponte Vecchio, Rambo Cache, and Gelato
Author: Dr. Ian Cutress

by Dr. Ian Cutress on December 24, 2019 9:30 AM EST

47 Comments | Add A Comment

47 Comments

Ponte Vecchio: The Old Bridge in the land of Gelato

With the disclosures about X^e-HPC and the new compute unit and memory fabric in mind, Intel went onto the big money announcement: the first X^e-HPC GPU will be a design called Ponte Vecchio (PVC).

If that looks like a weird GPU to you, then I agree. What we have are two segments, each with eight compute chiplets, a number of X^eMF chips, and HBM memory. I do want to step back a second to talk about some of the technologies around PVC before going into what we see here.

Earlier in 2019, I was offered a chance to speak to Intel’s Ramune Nagisetty. She’s in charge of Intel’s plans regarding chiplet and connectivity technologies: she’s the person to speak to when it comes to connecting chiplets with technologies like the Embedded Multi-Die Interconnect Bridge (EMIB), but also when it comes to new stacking technologies like Foveros. We’ve already seen EMIB in a number of products, such as Intel’s FPGA family, and consumer hardware like Kaby Lake-G, and Foveros is getting its first limelight in the new Lakefield chip for small-form factor client notebooks, which stacks a 14nm IO die with a 10nm Compute Die, all with a DRAM PoP on top.

In that discussion, it was confirmed what Raja had said at the Architecture Day a few months prior: both EMIB and Foveros will be coming to Intel’s new family of GPUs. Of course, EMIB and Foveros doesn’t come cheap: trying to align vias between silicon is best done when sober and with a clear head, and even then there are always optimizations to be made. In that regard, we assumed EMIB and Foveros would be going on the high-end discrete graphics solutions, what we now know to be X^e-HP. But it would appear that these technologies are likely to be in X^e-HPC as a priority, and Ponte Vecchio seems to be the first chip that will have both.

It is worth noting that technologies like EMIB and Foveros are going to define Intel’s future. Not only in GPUs, but also CPUs, FPGAs, AI, and everything else.

This slide shows that both technologies will be on Ponte Vecchio, but this slide actually says something a lot more than what it appears.

One of the questions with EMIB and Foveros is in what form it will take. There are many possibilities for both, and this slide might have exposed a few. Note that this slide only shows half of a Ponte Vecchio, and I want to do a custom diagram for you so show what I’m seeing here.

Here I’ve expanded out to a full PVC, with 16 X^e-HPC chiplets, 8 XeMF chips, and each half of the PVC has access to four HBM stacks. Each of the HBM stacks is connected via EMIB, along with what I think is an IO chip (perhaps built on 14nm).

From the official slide above, you will notice the group of 8 X^e-HPC chiplets and four X^eMF chips seem to be on a separate package altogether, with a unified BGA onto another tier (the mid-tier) of packaging below. To be honest, that slide also looks like there are some chips on that mid package, but on this mid-package we also get the EMIB connections and what I believe is an X^eMF scaling chip for off-GPU connections. The HBM2 and what IO chip seem to be on a bottom tier of packaging, which means that we have three tiers of packaging for this individual GPU: four if you include the Foveros on each of the X^e-HPC chiplets!

Now normally when we think of tiered packaging like this, we expect it to be BGA on BGA. There could be a part of this design that uses an interposer. If Intel was making this at TSMC, I might use the word ‘CoWoS’ as part of the stacking technology they use. At the minute it’s all a lot of guesswork. Assuming the diagram is accurate.

I’ve labelled each of the X^e-HPC chiplets as having Foveros, as we believe these to be two die stacked. A lot of commentary around this stacking is asking the question of what exactly are on each of the two dies that are stacked – is one IO and the other compute, like in Lakefield, or is it something else? A semi-reasonable conclusion is that one die might be cache, and the other die might be compute. However it’s important to note that GPUs tend to have local cache right next to the compute, to in actual fact I expect both of these dies in the Foveros stack to both be compute dies. It may be such that one is for SIMD and the other is for SIMT, but if the two were identical and modular then it makes it much easier to manufacture.

For those of you wondering what those metal brackets are around each of the X^e-HPC dies, those are likely to be shims to help correct for different chip heights. I should point out that the die inside the ship, the stacked X^e-HPC die, is actually smaller in the diagram than the X^eMF die between each pair of chiplets. This could be for all the cache on the X^eMF dies, or it could be if they are built on a different process node.

Also, it is worth noting that with Ponte Vecchio, we have two ‘slices’. From the diagram, it looks like that each slice has additional X^eMF connection points, which could be used to go off-GPU, or so more slices (up to four) could be connected together. Some of that we might address when we look at the Aurora system and Intel’s proposed layout of one of the servers.

Intel promises a couple of other PVC features. Not only will the hardware support ECC and parity checks across all memory and caches, but there will be Xeon-class RAS features and something called ‘in-field repair’. More details to come.

Also, a final word on the name: Ponte Vecchio. Intel has plenty of Lakes, Coves, Rapids, and other things. Apparently having a codename that involves a physical place actually helps address any potential legal issues without having to do due diligence for a proper marketing name. Raja Koduri said that he chose it because it’s based in Florence, which has good gelato. He likes gelato, and should Intel decide to do a launch event with Ponte Vecchio in mind, Raja gets his gelato. I went to Florence a couple of years ago – the gelato was good.

Xe: The Flavors and The Architecture oneAPI: Intel’s Solution to Software

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

47 Comments

View All Comments

MenhirMike - Tuesday, December 24, 2019 - link
This is way above the stuff I work with, but now I want RAMBO Cache on all my stuff.
Batmeat - Tuesday, December 24, 2019 - link

If you’ve skipped to this final page....

How did he know I would do that?
Duncan Macdonald - Tuesday, December 24, 2019 - link
Given Intel's brilliant(!!!) success in getting its 10nm process to work, I would take the dates with a few megatons of salt!!!
repoman27 - Wednesday, December 25, 2019 - link
Ian, I think your block diagram is a little off. Although the Intel illustrations clearly involve a certain amount of artistic license, I think we can agree that there's an organic package substrate with 8 HBM stacks and 2 transceiver tiles which are connected via EMIB to two larger modules. The modules appear to be a stack with two interposers sandwiched together. The bottom interposer has 8 large chips which are most likely the XeMF dies, as well as several color coded regions representing EMIB zones along with a bunch of vias. The top interposer has the 8 XeHPC chiplets and 4 additional chips which are almost certainly the RAMBO caches, seeing as they look exactly like the depiction of said caches in the other slide. Then there is one giant ball grid connecting the top and bottom layers of the sandwich.

That looks an awful lot like Co-EMIB to me. The 7nm compute chiplets and SDRAM caches (built on whatever process is the best fit) are bonded directly (Foveros) to a wafer with the memory fabric dies (probably on 14nm) and riddled with TSVs. Those modules then get singulated and plunked onto a substrate with a bunch of EMIBs inserted into it which connect them to each other as well as to the HBM stacks and transceiver tiles.

Also, this point seems a little harsh: "Transition through DDR3 to DDR4 (and DDR5?) in that time frame". Intel may be way behind on their roadmap, but they made the transition to DDR4 several years ago with Skylake.
repoman27 - Wednesday, December 25, 2019 - link
In fact, Intel may have already shown off a prototype wafer of the modules themselves: https://pbs.twimg.com/media/D_C-9b3U0AAeyv7.jpg

via Anshel Sag on Twitter: https://twitter.com/anshelsag/status/1148627973882...
thetrashcanisfull - Wednesday, December 25, 2019 - link
This seems worryingly light on technical details with a lot of bold performance claims. Particularly the architectural stuff

If Intel really has managed to execute a proper chiplet style GPU with EMIB / chip stacking, that would certainly open the door to major performance uplifts, but they are staying super vague on the underlying architecture and topology. Honestly, this slideware feels reminiscent to 3D XPoint, which, while still a solid technology, was years late and never delivered on the sort of hype it was announced with.

I'll remain skeptical until we get more details - the advances in packaging and interconnects that Intel is touting could certainly enable improvements on this scale, but Intel's execution over the last decade leaves a lot of room for doubt.
smilingcrow - Wednesday, December 25, 2019 - link
'Intel's execution over the last decade leaves a lot of room for doubt.'

Decade! I thought they were ahead of the pack generally until Zen 2 was released 18 months ago!
They have had a terrible 2 years but if you want to look at the last decade the real underachievers surely were AMD.
The next few years are crucial so we will have to see how things pan out.
thetrashcanisfull - Wednesday, December 25, 2019 - link
Decade may be an exaggeration, but not by much. Look at all of Intel's attempts to break into markets new markets: mobile/cellular, Larrabee/MIC, FPGAs (Altera), 3D XPoint...

Intel has shown that it can be fairly successful as an incumbent in the server/desktop/laptop CPU market (or at least it could until the 10nm problems) but outside of that Intel has consistently struggled to deliver on anything over the last 8+ years.
jabber - Wednesday, December 25, 2019 - link
Maybe it could be said with AMD struggling they did let off the gas pedal a bit and coasted a while.
thetrashcanisfull - Wednesday, December 25, 2019 - link
I think that's certainly true. Intergenerational improvements post Sandy Bridge were pretty anemic in the consumer market largely since intel refused to put out more than 4 cores on a mainstream platform until coffee lake. In the server/HEDT intel was doing pretty well for a while by virtue of increasing core counts, but the 10nm woes have halted any progress on that front.

Analyzing Intel’s Discrete Xe-HPC Graphics Disclosure: Ponte Vecchio, Rambo Cache, and Gelato

Ponte Vecchio: The Old Bridge in the land of Gelato

Post Your Comment

47 Comments

View All Comments

MenhirMike - Tuesday, December 24, 2019 - link

Batmeat - Tuesday, December 24, 2019 - link

Duncan Macdonald - Tuesday, December 24, 2019 - link

repoman27 - Wednesday, December 25, 2019 - link

repoman27 - Wednesday, December 25, 2019 - link

thetrashcanisfull - Wednesday, December 25, 2019 - link

smilingcrow - Wednesday, December 25, 2019 - link

thetrashcanisfull - Wednesday, December 25, 2019 - link

jabber - Wednesday, December 25, 2019 - link

thetrashcanisfull - Wednesday, December 25, 2019 - link

Log in

Don't have an account? Sign up now