04:54PM EDT - Hot Chips has gone virtual this year! Lots of talks on lots of products, including Tiger Lake, Xe, POWER10, Xbox Series X, TPUv3, and a special Raja Koduri Keynote. Stay tuned at AnandTech for our live blogs as we commentate on each talk. Intel recently had its own Architecture Day 2020, with Raja Koduri and other Intel specialists disclosing details about process and products. It will be interesting to see if Raja discusses anything akin to roadmaps in this keynote.

04:58PM EDT - Raja M. Koduri, Senior Vice President, Chief Architect, and General Manager of Architecture, Graphics, and Software, Intel'

05:01PM EDT - The title of the talk is 'No Transistor Left Behind'. Raja has had it on a t-shirt at a number of events

05:04PM EDT - 'Raja has spent his career enhancing accelerate compute in the technology industry, across graphics, vector compute, consoles, and semi-custom designs'

05:05PM EDT - First, paying tribute to Frances Allen, who recently passed away

05:08PM EDT - The balance of software abstraction and performance hardware execution is the boundary that Frances worked on and we still work on today

05:09PM EDT - A little of 20 years, Intel senior architects (hardware and software) got together to discuss heterogenity in Intel's roadmap and software roadmaps

05:09PM EDT - They all knew each other, but many of them were meeting each other for the first time

05:10PM EDT - That discussion is where the phrase 'No Transistor Left Behind' comes from

05:10PM EDT - David Blythe is Xe senior architect

05:11PM EDT - The role of hardware/software in our lives

05:11PM EDT - COVID has shown how vital the progress of the decades of tech improvements has become

05:11PM EDT - Technology has led disruptions

05:12PM EDT - Predicting the future is tough, but we expect to see 100 billion devices - intelligent computing

05:12PM EDT - Accessing data and compute from anywhere - exascale for everyone

05:12PM EDT - like electricity

05:12PM EDT - 10x growth opportunity for the industry

05:14PM EDT - A balance of performance vs general purpose

05:14PM EDT - Leveraging data to build intelligence - data that isn't analyzed isn't useful

05:14PM EDT - We need more capacity and more bandwidth at every level

05:14PM EDT - We need bandwidth to achieve exponential growth

05:15PM EDT - Gaps between what we have for memory today for AI vs what we need

05:15PM EDT - We need superhuman-style computing

05:15PM EDT - Now Moore's Law

05:16PM EDT - People have predicted the end of Moore's Law for decades

05:16PM EDT - Moore's Law is how we've built the last two eras of computing

05:16PM EDT - It has been harder and harder to deliver the required metrics

05:16PM EDT - But it's definitely not over yet

05:17PM EDT - There is plenty of room at the top

05:17PM EDT - Software helps us to get there as much as hardware does

05:17PM EDT - Python vs AVX512

05:17PM EDT - Over 100x perf on the same CPU with software updates

05:17PM EDT - New AI workloads allows vector optimization opportunities that weren't there before

05:18PM EDT - Transistor scaling though isn't helping as much as it used to

05:18PM EDT - Whatever we call the Moore Law in the modern age, we believe transistor density 50x easily

05:18PM EDT - 3x in FinFET itself

05:19PM EDT - x2 in Nanowire

05:19PM EDT - Nanowire stackeds for another 3x

05:19PM EDT - This is regular pitch scaling might stop

05:19PM EDT - then wafer-to-wafer stacking for 2x

05:19PM EDT - Then die on wafer stacking for 2x

05:19PM EDT - All of this is happening in labs across the world

05:19PM EDT - The vision will play out over the next decade or more

05:20PM EDT - Heat dissipation is a challenge too

05:20PM EDT - Room at voltage, capacity scaling, new pacakinbg, frew scaling, new architectures

05:21PM EDT - Also packaging - the future of Foveros is hybrid bonding

05:21PM EDT - Simpler interconnects with lower capacitance and lower power

05:21PM EDT - Stacked SRAM test chip recently taped out

05:22PM EDT - Significant investment allows Intel to drastically adjust its view on next gen packaging for end-user product

05:22PM EDT - Now memory hierarchy

05:23PM EDT - (the dreaded pyramid of optane)

05:23PM EDT - And the inverse next gen pyramid

05:23PM EDT - Need 10x improvement across the board

05:24PM EDT - Brainstorm next gen requirements with Tim Sweeney about next gen MMO

05:24PM EDT - Support 1000s of users or more at once with Hardware and Software

05:24PM EDT - But also make general purpose and accessible to everyone

05:25PM EDT - First, this is how hardware companies think:

05:25PM EDT - This is the concept we were thinking

05:25PM EDT - 25 cores per CPU - with density, go up 100x - 4x boards, then racks for 1million cores

05:26PM EDT - It's all about the interconnect!

05:26PM EDT - Now software

05:26PM EDT - The grumpy person reminds Raja of Jim Keller

05:27PM EDT - This contract between hardware/software is what matters

05:27PM EDT - All about ISA + OS developers

05:27PM EDT - It's all about performance and generality

05:28PM EDT - Rich software stack on x86 today

05:28PM EDT - The more abstraction, the more developers

05:28PM EDT - Abstractions are very leaky

05:29PM EDT - It's a Sisyphean effort

05:29PM EDT - What are the hardware/software contracts of the future?

05:29PM EDT - x86, Arm, RISC-V, AI, GPU, Memory, Network

05:30PM EDT - Intel is adding heterogenity in the CPU socket

05:30PM EDT - Beware of beyond Cooper Lake

05:31PM EDT - 3-5 years to see adoption of new hetero ISA extensions

05:31PM EDT - That's a broad software ecosystem statement

05:31PM EDT - The key to this is to give developers performance at every level

05:32PM EDT - Ninja developers at the low level can offer non-linear improvements higher up the stack

05:32PM EDT - Any abstraction needs to be scalable - open and accessible to all, Have to retain productivity at all levels while also maintaining perf

05:32PM EDT - Misconception that python isn't used for performance

05:33PM EDT - Ninja programmers are rare, but very important for performance

05:33PM EDT - Important to support ninjas

05:33PM EDT - Scaling across every product

05:33PM EDT - Level sub-zero

05:34PM EDT - OneAPI

05:34PM EDT - Still early days

05:34PM EDT - OneAPI beta available on Intel Dev Cloud

05:36PM EDT - Scale from sensors to edge to cloud

05:36PM EDT - Where will be in 2021

05:36PM EDT - milliwatts to Megawatts

05:37PM EDT - XeHP GPU !

05:37PM EDT - 1000x in compute by 2025

05:38PM EDT - Exascale for everyone

05:38PM EDT - Now time for Q&A

05:39PM EDT - Actually a few more comments first

05:45PM EDT - More complex hardware in the future

05:45PM EDT - Now Q&A

05:48PM EDT - Q: Integration between CPU and GPU A: We've been doing a lot time for the PC space, what hasn't been done yet is in the DC and at scale. The key is figuring out the programming model that scales - at the moment we see them a scalar/vector/matrix and it's all about combining them and building the programming model. Physical integration is also key, at high performance.

05:48PM EDT - Q: Does intel plan the open source Xe dGPU code, as with the Gen11, or will it be closed stack? A: We are pretty active in Linux open source. Xe drivers in Linux will be Open Source.

05:51PM EDT - Q: Does ISA matter in a future of accelerators? A: Great Question. It's the central thesis of the talk. DSA - do you need an ISA, or not? My thesis on ISA is important is for the general purpose, for the mass install base - architectural impact based on that hardware software contract. Lots of us have worked on DSAs, but when you talk generality, today, ISA still matters. If you move the contract up the stack, is that in the form of an ISA, how does it look? It's a trillion dollar question, I'm not proposing that I have an answer, but my talk is that we are working on it, and we will share what we find through OneAPI, and in some ways it's a call to action for the whole community. It has to cover the whole industry, not just one vendor or architecture.

05:53PM EDT - Q: Security HW vs SW, direction in industry vs academia? A: Great Question. I could have spent more time on Security if I had more time and Intel's vision! It's super important. The surface area we are generating over all these layers of hierarchy - the security attack surface area is growing more than exponential. It's scary! It's a big call to action for the community to. Security is hardware as we move forward, not easier. Architecture opportunities and simplications, in both hw and sw is daunting.

05:56PM EDT - Q: ML revolution - libraries or GP stack? A: Great question. We already have special paths in TF and pytorch - the inner loops have been phenominally accelerated in the last few years. As we do the analysis in the workload, we are seeing the bottlenecks shifting around as we optimize. The algorithm rate of change is quite high - with the community and our customers, I have a lot of conversations about generality of future approaches. It's a whack-a-mole. Right now the need for generality software stack is potent and there are lots of discussions that are active. Is there an API that develops a better scalable contract? It's hard to tell.

06:00PM EDT - Q: Other approaches of wide purpose compute like OpenCL haven't succeeded. What makes OneAPI work? A: At one point there was abstractions of the GPU hardware, and even with all the limiotations a decade ago. I don't believe at OpenCL it wasn't really taken a step back to look at the overall compute problem. If you go back more than two decades ago, the work in high performance computing systems in languages there is a lot of golden nuggets and answers that sit on those infrastructures. That's one of the things we look at. Personally I also have been a big fan of abstraction of Apple's grand central dispatch. I know swift and concurrency models in swift have made some amazing progress, then there's Apache Spark too. If you look at those models on those software frameworks, there is somehting there for us a hardware community to pay attention to. I won't say OpenCL is a great example is a great example to cover all forms of parallelism (dense, sparse, async, task) and memory heterogenity is a big deal - how do we cover that? That's a harder problem in my opinion.

06:01PM EDT - Q: About mem power efficiency how do you see 10x required BW/power scaling? A: The opportunities I see is that we have to get compute closer to memory (or memory closer to compute). As I alluded to, we're doing so interesting things with new products, like Rambo cache which we've announced. If you look at both capacity and latency at every level, you do see that 10x opportunity. 10x doesn't seem to hard, but memory is hard, because memory isn't just a hardware/tech problem, it's also a big business problem!

06:04PM EDT - That's the end of the Q&A. Now onto the third session. First up, a RISC-V talk

Comments Locked

19 Comments

View All Comments

  • Byte - Monday, August 17, 2020 - link

    Even worst, the goal is simply to get to the lowest common denominator. Should we not have have everyone excel instead? Lowering the bar so that everyone can clear it is just sad all around.
  • diediealldie - Monday, August 17, 2020 - link

    no transistors left behind, because we don't have anything left due to manufacturing problems...
  • grrrgrrr - Monday, August 17, 2020 - link

    Jensen has been printing cuda cores in all shapes and sizes and the developers are using them.
  • Alistair - Monday, August 17, 2020 - link

    haha, quite right
  • ksec - Tuesday, August 18, 2020 - link

    I don't like the idea Intel trying to spin Moore's law as transistor improvement only and ignore the time scale.
  • KimGitz - Thursday, August 20, 2020 - link

    I know better than to rule out Intel out of the race. They have deep pockets and have a massive invest in research and development. The problem with Intel is their manufacturing which was once an advantage since AMD, ARM, NVidia, IBM manufacture their own designs. I'm glad Intel has encountered some difficulty, they honestly needed to swallow their humble pie. The return to competitiveness in the industry will benefit everyone, interestingly more so Intel, who are awakening from their slumber and arrogance. Intel have the money to sort out their manufacturing woes and can outsource in the meantime. I'm excited for Tiger Lake H on laptops and later Alder Lake for both laptops and desktops. Lenovo has already announced several products from AMD Zen2 and Intel Tiger Lake U.
  • Konkari - Thursday, August 20, 2020 - link

    I find the quote he is putting to Jim Keller to be ambiguous. Is he suggesting that Jim is at the bottom and Raja is at the top? Jim left Intel, however he is probably the only person who could have made a difference at Intel. Well, I guess there is plenty of room at the bottom for Intel.
  • Pol Newman - Sunday, September 20, 2020 - link

    I admire this person. He is not just a tough specialist in his field, he also perfectly knows how to make meaningful reports with calculations and accurate indicators. It seems to me that this is difficult and I just can't learn how to convey information so accurately even within the audience of the institute. Perhaps because all my term papers and research papers are done for me by specialists from one of the services that I found on https://bestwritersonline.com/ If I did it myself, then perhaps I would also be able to submit information with high quality. I'll have to try. I think this will be an interesting experience.
  • Tommy01 - Saturday, July 31, 2021 - link

    Are you having trouble learning? I would want to help you then using <a href="https://myexamcoach.com/">myexamcoach.com&... this specific site. There's a lot of nice things on this page! You don't have confidence in your online test? Then you're here, honest and skilled experts with a lot of knowledge about their business. So, if you have no trust, buy an online exam or pay a professor if that's not an online test!!

Log in

Don't have an account? Sign up now