At Intel's Investor Day today, CEO Bob Swan and Murthy Renduchintala spoke to the ability of the company with respect to its manufacturing capabilities. Intel has historically been strong in its ability to execute on its process technology, however the delay of its 10nm process has obviously raised multiple question marks, and has done for several years. The two Intel executives went into a little detail about what Intel was doing in the interim, and how it has learned from the issues.

Back in 2013, Intel envisoned its 10nm to succeed the 14nm by providing 2.7x density, with new technologies such as Self-Aligned Quad Patterning (SAQP), Contact over Active Gate (COAG), Cobolt Interconnects, and new packaging technologies such as EMIB and Foveros. Intel admits that this was an ambitious plan, and the goals were not clearly defined with the teams and it was ultimately overly complex and not managed in an ideal way.

This ended up pushing 10nm out into a later time frame. In this case, Intel pushed 10nm out to 2019 (technically they shipped Cannon Lake in small quantities on 10nm in 2017, however that is nothing more than a curio in the timeline of semiconductors), and filled the gap with 14+ and 14++.

Intels 14+ and 14++ processes extracted more than 20% more performance (from Broadwell to Whiskey Lake) from the process since its inception. As a result, Intel is prepared to not only get ready for future intra-node optimizations, but actually adjust the roadmap to compensate for it. Murthy made it clear that Intel wants to introduce a Moore's Law-like gain at the beginning of a new process, and another similar gain by the end of the process.

Intel has stated that its 10nm product family (beyond Cannon Lake) will start to be available from the middle of this year (2019), with Ice Lake on client platforms (notebooks).

Intel will be launching multiple 10nm products through 2019 and 2020, including server based 10nm in the first half of 2020:

In the above slide, Intel states that it will have 7nm in production and launching a product in 2021. That sounds very aggressive for a company that has had issues with 10nm. It even shows in Intels radmap, with 10nm (and 10+ and 10++) having a much shorter life cycle than the 14nm family of processes.

With this in mind, Intel's 7nm is going to be the combination of what Intel has learned from the 14nm and 10nm family of products. Intel wants that 2x scaling (Moores Law), but with intra-node optimations planned as part of the roadmap. Intel is also reducing its number of design rules, which should help with execution. 7nm will also be where Intel intersects with EUV, and also introduce next-gen Foveros and EMIB packainging.

Intel provided this slide, which shows a monolithic PC-Centric die with a multi-die Data-Centric chip built on both Foveros and EMIB. This corroborates our discussion with Intel's chiplet and packaging team, who also stated that we would see Foveros and EMIB on a combined product - specifically the GPU.

Intel announced that its lead 7nm product (lead = top, or lead = first?) would be its new GPGPU, built on the Xe graphics architecture. Intel has stated that its Xe product stack will feature two different microarchitectures from mobile client up to GPGPU, with one of those architectures called Arctic Sound - technically Intel will launch its first discrete GPU in 2020 according to its press release, however the 7nm GPGPU will be launched in 2021.

More information is coming out of Intel's Event, more to follow.

Related Reading

Source: Intel

POST A COMMENT

237 Comments

View All Comments

  • Wilco1 - Thursday, May 9, 2019 - link

    SRAM is more dense than logic. There are multiple SRAM cell choices with different area, power and performance characteristics. L1 cache is typically less dense than L2 or L3 cache because it needs to be fast. Reply
  • ats - Friday, May 10, 2019 - link

    There isn't anything on a chip that is as transistor dense as SRAM. SRAM cells are specifically designed and optimized for size. In many cases, SRAM cells utilize process rules that cannot be used for anything else due to the vast amount of resources spent on optimizing, verifying, and testing the SRAM cells. Nothing else gets the same amount of resources which allows them to characterize the process performance for the SRAM cells and their arrays to a level that simply isn't achievable with logic.

    Even the highest performance SRAM cells tend to be significantly more dense than logic.
    Reply
  • Calin - Friday, May 10, 2019 - link

    Density is measured in terms of transistors per square millimeter, and high thermal power, higher level complexity, hot spots tend to force a decrease in transistor density (due to cooling issues).
    SRAM by comparison has few hotspots, has a very regular structure and can be easily optimized, ...
    SRAM is dense in transistors (even if expensive in terms of actual memory size compared to physical size).
    Reply
  • Bulat Ziganshin - Thursday, May 9, 2019 - link

    Seems that it's pure difference in transistor counting methodology - at https://en.wikipedia.org/wiki/Transistor_count we see:
    - 28-core Xeon Platinum 8180 is 8B transistors
    - 32-core AMD Epyc is 19B transistors

    Moreover, 32-core AMD Epyc occupies 768 mm^2, while 22-core Xeon occupies 456 mm^2, both with 14 nm technology. So, it seems that Intel counts 2x less transistors for the same thing.
    Reply
  • Wilco1 - Thursday, May 9, 2019 - link

    No - the densities are comparable and close: densities are 17.5 MT/mm^2 for Xeon and 24.7 MT/mm^2 for Epyc. There is no difference in counting that can explain Intel's lower density.

    As to why Epyc has more transistors - these are very different microarchitectures on different processes. Both chips have huge amounts of SRAM. The numbers of transistors per bit depends on the process and cell design. It's quite possible for the GF process to require 8 or even 10 transistors per bit while the Intel process can use 6 per bit.
    Reply
  • Zizy - Thursday, May 9, 2019 - link

    Well, you can take a look at a floorplan and consider just cache, ignoring the rest - you know the area from the image, you know cache size.

    I don't recall these numbers as I haven't seen any in a long time, but it should be easy to get MB/mm of the same L3 cache on both chips.
    This would be a relevant comparison of actual process density, assuming both require the same number of transistors per MB and have comparable design goals in terms of frequency, size and whatnot.
    Reply
  • Wilco1 - Thursday, May 9, 2019 - link

    There is no need to do that - the SRAM densities for various cells are widely published already. Logic density is where things differ the most due to routing, contacts, diffusion breaks, metal layers, design rules, tracks etc.

    So to compare density fairly and without the marketing BS we must compare actual chip density. It's best to compare like with like (eg. server with server) so that the amount of cache, frequency etc are similar and you get a fair comparison.
    Reply
  • peevee - Thursday, May 9, 2019 - link

    EPYC is not a single chip, it is a multi-chip package.
    https://images.anandtech.com/doci/11551/epyc_tech_...
    Reply
  • shompa - Thursday, May 9, 2019 - link

    one reason why Intel can clock higher than for example ARM is dark areas on the die. Intel can't pack as many transistors and still be 1-2ghz higher clocks. Reply
  • ZolaIII - Friday, May 10, 2019 - link

    The reason why X86 cores can clock higher is in two things; design of the core (longer pipeline) & high performance rooting library that basically ties more fin's together in a single contact to ensure better conductivity & drain. The sustainable leaking limit for FinFET structures on silicone is 2 GHz no matter how you root it that's why efficient server processors even from Intel still have base frequency around that limit... Reply

Log in

Don't have an account? Sign up now