One of the more esoteric elements of Intel’s Architecture Day 2020 came very near the end, where Intel spent a few minutes discussing what it believes is the future of some of its products. Brijesh Tripathi, VP and CTO of Intel’s Client Computing group, laid out a vision about the future of its client products in the 2024+ future timeframe. Centered around Intel’s 7+ manufacturing process, the goal was to enable ‘Client 2.0’ – a new way to deliver and enable immersive experiences through a more optimized silicon development strategy.

Chiplets aren’t new, especially with recent launches from Intel’s competitors, and as we move into more complex process node development, the era of chiplets enables faster time-to-market as well as better binning and yields for a given product. The key is enabling how those chiplets fit together, and at which points it makes sense to mix and match the relevant ones. Intel has spoken about this before in a more generalized context, at its Technology and Manufacturing Day 2017, as shown in the carousel image at the top.

The goal here is to mix and match which process nodes work best for different parts of the chip. Intel seems set to realize this vision starting with its 7nm platform. At Architecture Day 2020, Brijesh Tripathi showed this slide:

On the left is a typical chip design – monolithic with everything it requires. For Intel’s leading edge products, these take 3-4 years to develop, and bugs are found in silicon by both Intel initially and then later by Intel’s partners as they can ramp up the silicon-on time by a a few orders of magnitude.

In the middle is a basic chiplet layout, similar to that slide from 2017, where different functions of the die are split into their own modules. Assuming a consistent interconnect, there are some reuse of the silicon elements, such as AMD using the same core compute dies in client and server. For some semiconductor companies (except Intel), this is where we are.

On the right is where Intel sees its future. Instead of having a single digit number of chiplets in a product, it envisions a world where each IP can be split into multiple chiplets, enabling products to be built with different configurations of what works for the market. In this instance, a chiplet might be a PCIe 4.0 x16 link – if the product needs more, it simply adds in more of these chiplets. Same with memory channels, cores, media accelerators, AI accelerators, Ray Tracing engines, crypto accelerators, graphics, or even as far down as SRAM and caching blocks. The idea is that each IP can be split and then scaled. This means that the chiplets are tiny, can be built relatively quickly, and bugs should be ironed out very quickly.

In this diagram, we are treated to Intel’s long term vision for the client – a base interposer with an in-package memory (something like an L3 or L4) that can act as the main SRAM cache for the whole die, and then on top of this we get 24 different chiplets. Chiplets can be graphics, cores, AI, Media, IO, or anything else, but they can be mixed and matched based on what is needed. A content creator might want a balance between some good graphics acceleration and compute, while a gamer might want to focus purely on the graphics. A corporate client or workstation might need less graphics and more for compute and AI, whereas a mobile version of the chip will be heavily invested in IO.

As always, there is some trade-off between chiplet size and complexity of actually putting them together in a multi-die arrangement. Any communications between chiplets costs more power than a monolithic interpretation, and usually offer higher latency. Thermals have to be managed as well, and so sometimes those chiplets are limited by what thermal properties are available. Multi-die arrangements also cause headaches for mobile devices, where z-height is critical. However, the benefits afforded from using the right process at the right time for the right product are big, as it helps provide both performance and power at the best possible cost. It also gives the opportunity to bring in 3rd party IP quickly if something amazing hits the scene.

The only downside here is that Intel hasn’t spoken much about the glue that binds it all together. Chiplet strategies rely on complex high-speed interconnect protocols, custom or otherwise. Current uses of Intel’s die-to-die connectivity are either simply memory protocols or FPGA fabric extensions – the big ones for server CPUs like UPI aren’t necessarily up to the task. CXL could be the future here, however current CXL is built upon PCIe, which means a complex CXL/PCIe controller for every chiplet which will likely get power hungry fast.

Intel has stated that they are inventing new packaging technology and new levels of connectivity to act between the silicon - there is no disclosure on the protocols at this time, however Intel acknowledges that to get to this level of scale it will have to go beyond what the company has today, and that will require creating standards and innovation in this area. The goal is to create and support standards, and the first incarnation will have some standardization built in. Intel states that this is a method of extreme disaggregation, and to note that not everything that is connected has to be high bandwidth (such as USB) or a coherent interconnect - Intel sees the goal involving a handful of protocols throughout the spectrum.

There’s also the developer market, which might be used to a more homogeneous implementation of resources in any given product. Without careful planning, and relevant coding, there is the potential for certain chiplet configurations to fall over if the developer was expecting a certain ratio of compute to graphics, for example. This isn’t something that OneAPI could easily fix.

These are all issues that Intel will have to address, although they have a few years until this comes to fruition. We were told that the internal name is Client 2.0, although it will likely have more marketing dressing added as Intel starts talking about it in more detail.

Related Reading

 

Comments Locked

67 Comments

View All Comments

  • PixyMisa - Friday, August 21, 2020 - link

    Even POWER10 has multiple chips - the main CPU (with one or two CPU dies), an L4 cache chip (960MB), and a memory controller chip.
  • Kevin G - Sunday, August 23, 2020 - link

    POWER has been doing the MCM since POWER2. I think only POWER3 was the processor in the line up that was offered exclusively in a single chip package.

    Most of those have used traditional wire bonding and IBM's high end stuff still uses ceramic packaging. AMD is leveraging wire bonding in Epyc.

    Intel is looking to go the next step with interposers and EMIB as appropriate. That cuts down on power, shaves a few hairs off of latency, supports higher clocks and wider interfaces. Win-win from a technical sense if thermodynamics weren't so difficult between high power dies and costs were more reasonable.

    Both AMD and nVidia have indicated interest in using interposers and/or EMIB as appropriate. Using numerous smaller dies to build something like a GPU simply makes sense if you can cool the resulting slab of silicon on a package.

    Really just a matter of time who gets a product like this to market first.
  • vFunct - Saturday, August 22, 2020 - link

    Chiplets are different because they're on the same package. Otherwise you can say Intel's 8086 + 8087 did the same thing...
  • TristanSDX - Friday, August 21, 2020 - link

    chiplets are best only to save costs by reduce investment, where cores with high pace of evolution must work with cores of low pace of evolution. High perf specialised cores (GPU, RayTracing, AI, CPU) have high pace of evolution, so should be integrated in monilitic way.
  • Flunk - Friday, August 21, 2020 - link

    The reason they're doing this is to reduce die size, which increases yield, even if it means they need a bunch of dies.
  • FunBunny2 - Saturday, August 22, 2020 - link

    "which increases yield"

    well, increases the number dies printed per wafer, but, IIRC, as node size has shrunk the % of good dies per wafer has decreased, so it's always a see-saw between gross yield and shippable yield. by how much at each step I don't know. it is legend that printing at a step or two or three larger than current step does wonders for yielding good dies.
  • sor - Saturday, August 22, 2020 - link

    That’s the beauty of chiplets. If you have a massive die with CPU, GPU, IO, etc all together, a flaw in any one of these can possibly waste the entire die. Fab four distinct CPU chiplets in the same area instead and you end up with three good CPUs and one bad.

    Additionally you get the ability to mix and match node sizes for each component to their optimal yields.
  • Alexvrb - Sunday, August 23, 2020 - link

    They have had strategies for dealing with flawed chips for decades. When you get flaws you often still get a lower-tier chip. It depends on how much redundancy there is and where the flaws are, but for a recent monolithic example fusing off CPU and GPU cores in a Renoir produces different models... and those fused off cores don't have to be in the same place from chip to chip! So the flaws can be in various places. GPUs are another great example. That's why they subdivide them into CUs, and it's also why they don't sell many fully-enabled large die chips to consumers. They can cut them down as needed both to meet market needs and also to sell a larger percentage of flawed chips.

    Also, just using chiplets doesn't really "solve" this problem, although it does enable additional flexibility in using flawed chips. Like a good I/O chip ends up in HEDT or server, a slightly-flawed one ends up in a Ryzen dual chiplet, and a more-flawed one still might be OK for a single chiplet Ryzen or an X570 chipset. Ditto for the chiplets themselves, you have 8 core chiplets with 6 or 4 cores enabled. It could be one fully enabled CCX or two half-working CCX. Lots of options for flawed chips.
  • Arbie - Friday, August 21, 2020 - link

    I got as far as "2024+". Not saying that future vision isn't important, but Intel's recent ability to project even six months has been disappointing. Their public pronouncements of plans so much more distant are simply uninteresting.

    It's a shame because until about 2017 I believed what they said.
  • Kangal - Saturday, August 22, 2020 - link

    Intel peaked in 2014-2016.
    That's when they were the world's leader in 14nm fabrication, way ahead of Samsung, TSMC, GlobalFoundaries, and SMIC. And that's when their Sky Lake architecture was miles ahead of AMD's Piledriver architecture, and the very outdated ZX C3+ by Zhaoxin. And it was even competitive against ARM, which at the time was only pushing the troubled Cortex A57 on an on planar 28nm wafers.

    Just think about their first consumer 8-core processor, the i7-5960X from 2014, and compare that to their latest 8-core processor, the i7-10700KF that is for 2021. There's been notable improvement in terms of power draw/thermals comparing the old 280W to the new 210W, yet, both desktop chips still are classed in the high-watt power category. So what really differentiates them is the performance. Within the timeframe of 7 years, they've only managed a rough ~1.30x times increase. You can deduce that figure by looking at the CineBench r20 figures, of roughly 5100 points for the new chip, compared to roughly 3900 points on the old processor.

    In 2017, AMD practically caught up to Intel. Sure their Zen1 architecture was inferior to Sky Lake architecture at the time, and the 16nm lithography they used was also inferior to Intel's +14nm wafers. But with 12nm and Zen+ they closed the gap. And later with 7nm and Zen2, they overshot Intel by a clear margin. Now AMD is poised to bring together +7nm lithography, Zen3, and RDNA-2 iGPU in a few months, while Intel is still stuck manufacturing their old offerings and promising that better things are on the way. Oh, how the turntables have...

    If you think that's interesting, it's also been said that Zhaoxin plans to take AMD's IP and recent advances, in order to manufacture x86-PC with the aid of SMIC foundry. It is aimed for the Mainland Chinese market where they are shielded from international lawsuits and asset freezing. They would eventually like to export this to other countries, and gain revenue (outside of first-world western markets) presumably. So the 2014-2016 world leader Intel may be matched (or even surpassed) by some no-name competitor. And I'll name drop RISC-V here too, with the promising looking SiFive U8 processor.

    And that's ignoring the elephant in the room, the current market leader, ARM. Heck, the proprietary solution made by Apple last year in the form of the Thunder CPU microarchitecture itself is jaw-dropping. But we're expecting to soon see even further advanced lithography from Samsung-TSMC and the ARMv9 architecture.

Log in

Don't have an account? Sign up now