AMD Prepares 32-Core Naples CPUs for 1P and 2P Servers: Coming in Q2

by Ian Cutress on March 7, 2017 10:15 AM EST

91 Comments | Add A Comment

91 Comments

For users keeping track of AMD’s rollout of its new Zen microarchitecture, stage one was the launch of Ryzen, its new desktop-oriented product line last week. Stage three is the APU launch, focusing mainly on mobile parts. In the middle is stage two, Naples, and arguably the meatier element to AMD’s Zen story.

A lot of fuss has been made about Ryzen and Zen, with AMD’s re-launch back into high-performance x86. If you go by column inches, the consumer-focused Ryzen platform is the one most talked about and many would argue, the most important. In our interview with Dr. Lisa Su, CEO of AMD, the launch of Ryzen was a big hurdle in that journey. However, in the next sentence, Dr. Su lists Naples as another big hurdle, and if you decide to spend some time with one of the regular technology industry analysts, they will tell you that Naples is where AMD’s biggest chunk of the pie is. Enterprise is where the money is.

So while the consumer product line gets columns, the enterprise product line gets profits and high margins. Launching an enterprise product that gains even a few points of market share from the very large blue incumbent can implement billions of dollars to the bottom line, as well as provided some innovation as there are now two big players on the field. One could argue there are three players, if you consider ARM holds a few niche areas, however one of the big barriers to ARM adoption, aside from the lack of a high-performance single-core, is the transition from x86 to ARM instruction sets, requiring a rewrite of code. If AMD can rejoin and a big player in x86 enterprise, it puts a small stop on some of ARMs ambitions and aims to take a big enough chunk into Intel.

With today’s announcement, AMD is setting the scene for its upcoming Naples platform. Naples will not be the official name of the product line, and as we discussed with Dr. Su, Opteron one option being debated internally at AMD as the product name. Nonetheless, Naples builds on Ryzen, using the same core design but implementing it in a big way.

The top end Naples processor will have a total of 32 cores, with simultaneous multi-threading (SMT), to give a total of 64 threads. This will be paired with eight channels of DDR4 memory, up to two DIMMs per channel for a total of 16 DIMMs, and altogether a single CPU will support 128 PCIe 3.0 lanes. Naples also qualifies as a system-on-a-chip (SoC), with a measure of internal IO for storage, USB and other things, and thus may be offered without a chipset.

Naples will be offered as either a single processor platform (1P), or a dual processor platform (2P). In dual processor mode, and thus a system with 64 cores and 128 threads, each processor will use 64 of its PCIe lanes as a communication bus between the processors as part of AMD’s Infinity Fabric. The Infinity Fabric uses a custom protocol over these lanes, but bandwidth is designed to be on the order of PCIe. As each core uses 64 PCIe lanes to talk to the other, this allows each of the CPUs to give 64 lanes to the rest of the system, totaling 128 PCIe 3.0 again.

On the memory side, with eight channels and two DIMMs per channel, AMD is stating that they officially support up to 2TB of DRAM per socket, making 4TB in a single server. The total memory bandwidth available to a single CPU clocks in at 170 GB/s.

While not specifically mentioned in the announcement today, we do know that Naples is not a single monolithic die on the order of 500mm² or up. Naples uses four of AMD’s Zeppelin dies (the Ryzen dies) in a single package. With each Zeppelin die coming in at 195.2mm², if it were a monolithic die, that means a total of 780mm² of silicon, and around 19.2 billion transistors – which is far bigger than anything Global Foundries has ever produced, let alone tried at 14nm. During our interview with Dr. Su, we postulated that multi-die packages would be the way forward on future process nodes given the difficulty of creating these large imposing dies, and the response from Dr. Su indicated that this was a prominent direction to go in.

Each die provides two memory channels, which brings us up to eight channels in total. However, each die only has 16 PCIe 3.0 lanes (24 if you want to count PCH/NVMe), meaning that some form of mux/demux, PCIe switch, or accelerated interface is being used. This could be extra silicon on package, given AMD’s approach of a single die variant of its Zen design to this point.

Note that we’ve seen multi-die packages before in previous products from both AMD and Intel. Despite both companies playing with multi-die or 2.5D technology (AMD with Fury, Intel with EMIB), we are lead to believe that these CPUs are similar to previous multi-chip designs, however there is Infinity Fabric going through them. At what bandwidth, we do not know at this point. It is also pertinent to note that there is a lot of talk going around about the strength of AMD's Infinity Fabric, as well as how threads are manipulated within a silicon die itself, having two core complexes of four cores each. This is something we are investigating on the consumer side, but will likely be very relevant on the enterprise side as well.

In the land of benchmark numbers we can’t verify (yet), AMD showed demonstrations at the recent Ryzen Tech Day. The main demonstration was a sparse matrix calculation on a 3D-dataset for seismic analysis. In this test, solving a 15-diagonal matrix of 1 billion samples took 35 seconds on an Intel machine vs 18 seconds on an AMD machine (both machines using 44 cores and DDR4-1866). When allowed to use its full 64-cores and DDR4-2400 memory, AMD shaved another four seconds off. Again, we can’t verify these results, and it’s a single data point, but a diagonal matrix solver would be a suitable representation for an enterprise workload. We were told that the clock frequencies for each chip were at stock, however AMD did say that the Naples clocks were not yet finalized.

What we don’t know are power numbers, frequencies, processor lists, pricing, partners, segmentation, and all the meaty stuff. We expect AMD to offer a strong attack on the 1P/2P server markets, which is where 99% of the enterprise is focused, particularly where high-performance virtualization is needed, or storage. How Naples migrates into the workstation space is an unknown, but I hope it does. We’re working with AMD to secure samples for Johan and me in advance of the Q2 launch.

Gallery: AMD Naples Slide Deck

91 Comments

View All Comments

Haawser - Wednesday, March 8, 2017 - link
Commercial/Enterprise servers (rather than HPC supercomputers) make up ~80% of all server sales. And due to Intels huge margins, ~58% of their operating profit.

Naples with its more cores, less power, lower price and SME/SEV is going to provide them with their first real competition in years.
Walkeer - Wednesday, March 8, 2017 - link
I am a bit woried about the infitiny fabric socket interconnect. It it will have the same bandwidth as PCI-E 3.0, which is cca 1GBps for 1 lane, 64lanes have only 64GBps, which is much slower that the memory bandwith for 1 socket, which is 170GBps. Therefore, since this is a NUMA architecture, memory access from one socnet to other socets RAM will be significantly crippled not counting the intra-processor communication, which make it eve slower. Hope the frequency on the inifinity fabric going over PCI-E lanes will be significantly faster, else non-NUMA aware SW will be slow.
deltaFx2 - Thursday, March 9, 2017 - link
I don't think they said it uses PCIe, just that it multiplexes on the PCIe pins. I doubt they're using PCIe protocol for socket-to-socket interconnect.
phoenix_rizzen - Sunday, March 12, 2017 - link
I can't find the link right now, but there's a really nice article that covers Infinity Fabric more deeply, and how it's a superset of HypetTransport 3.0. It's scalable into the hundreds of MBps of throughput.

For now, here's the Wikipedia link:
https://en.m.wikipedia.org/wiki/HyperTransport#Inf...
phoenix_rizzen - Sunday, March 12, 2017 - link
Ah, here it is:
http://wccftech.com/amds-infinity-fabric-detailed/
zodiacsoulmate - Wednesday, March 8, 2017 - link
im looking at my next desktop computer processor
sharath.naik - Wednesday, March 8, 2017 - link
I don't think you read between the lines. Naples is 4 ryzen chips on a package. If my guess is correct it will scale better than a dual socket configuration. BUT, this is a big BUT. if each ryzen chip has 2 memory channels then does it mean the other 6 memory channels considered as remote memory?? there is a 40% drop in performance in a dual socket configuration if one socket is accessing the memory of the other. This is with intel having 4 channels (the only reason that should be the case if the memory exceeds the ram in that socket). now we are talking about just 2 channels. So my best guess is it will perform like a champ for multithreaded workload that can fit into the ram in the 2 channels(Like the vms). but would be really bad for those app that need a large amount of ram. hmm.. This will also mean this will only be targeted to those server farms with huge number VMs. I admit that is the largest market for such chips. And this should also scale well when the ryzen chip added 2 more cores then Naples will have 40 cores. but for big data analytics this will not be the chip to buy as single monolithic chip is still needed to avoid the huge remote memory access cost.
Haawser - Wednesday, March 8, 2017 - link
Yeah, I think your right. I think the commercial/enterprise market is exactly where Naples is targeted.

I imagine that for HPC/Big Data they'll do what they've already talked about, ie- HPC APUs with HBC/HBM. Which won't be monolithic, but MCM on an interposer. There were rumors of a second 8 core chip, so I think it's likely (if true) that it's the one for HPC APUs.

It's going to be interesting. No doubt.
deltaFx2 - Thursday, March 9, 2017 - link
I read in another review that AMD calls it 'die NUMA'. It's likely, given that it's an MCM, that the die-to-die channels have much lower latency than going off socket. It's likely that Naples has 3 NUMA levels, on-die, off-die, and off-socket. Without actual benchmarking it's hard to speculate what the perf will be, but it certainly seems like the machine is a 'scale-out' machine, for cloud hosting of different VMs or just different applications. As opposed to a big Oracle database...
0ldman79 - Monday, March 13, 2017 - link
I totally read his name as "Forrest Nimrod" the first time...

AMD Prepares 32-Core Naples CPUs for 1P and 2P Servers: Coming in Q2

Related Reading

Post Your Comment

91 Comments

View All Comments

Haawser - Wednesday, March 8, 2017 - link

Walkeer - Wednesday, March 8, 2017 - link

deltaFx2 - Thursday, March 9, 2017 - link

phoenix_rizzen - Sunday, March 12, 2017 - link

phoenix_rizzen - Sunday, March 12, 2017 - link

zodiacsoulmate - Wednesday, March 8, 2017 - link

sharath.naik - Wednesday, March 8, 2017 - link

Haawser - Wednesday, March 8, 2017 - link

deltaFx2 - Thursday, March 9, 2017 - link

0ldman79 - Monday, March 13, 2017 - link

Log in

Don't have an account? Sign up now