Just under four years ago, Arm announced their Neoverse family of infrastructure CPU designs. Deciding to double-down on the server and edge computing markets by designing Arm CPU cores specifically for those markets – and not just recycling the consumer-focused Cortex-A designs – Arm set about tackling the infrastructure market in a far more aggressive manner. Those efforts, in turn, have increasingly paid off handsomely for Arm and its partners, whom thanks to the likes of products like Amazon’s Graviton and Ampere Altra CPUs have at long last been able take a meaningful piece of the server CPU market.

But as Arm CPUs finally achieve the market penetration that eluded them in the previous decade, Arm needs to make sure it isn’t resting on its laurels. Of the company’s three lines of Neoverse core designs –the efficient E, flexible N, and high-performance V – the company is already on its second generation of N cores, aptly dubbed the N2. Now, the company is preparing to update the rest of the Neoverse lineup with the next generation of V and E cores as well, announcing today the Neoverse V2 and Neoverse E2 cores. Both of these designs are slated to bring the Armv9 architecture to HPC and other server customers, as well as significant performance improvements.

Arm Neoverse V2: Armv9 Graces High-Performance Computing

Leading the charge for Arm’s new CPU core IP is the company’s second-generation V-series design, the Neoverse V2. The complete V2 platform, codenamed Demeter, marks Arm’s first iteration on their high-performance V-series cores, as well as the transition of this core lineup from the Armv8.4 ISA to Armv9. And while this is only Arm’s second go at a dedicated high-performance core for servers, make no mistake: Arm aims to be ambitious. The company is claiming that Neoverse V2 CPUs will offer the highest single-threaded integer performance available in the market, eclipsing next-generation designs from both AMD and Intel.

While this week’s announcement from Arm is not a full-on deep-dive of the new architecture – and, more annoyingly, the company is not talking about specific PPA metrics – Arm is offering a high-level look at some of the changes and features that will be coming with the V2 platform. To be sure, the V2 IP is already finished and shipping to customers today (most notably NVIDIA), but Arm is playing coy to some degree with what they’re saying about V2 before the first chips based on the IP ship in 2023.

First and foremost, the bump to Armv9 brings with it the full suite of features that come with the latest Arm architecture. That includes the security improvements that are a cornerstone feature of the architecture (and especially handy for cloud shared environments) along with Arm’s newer SVE2 vector extensions.

On the latter, Arm is making an interesting change here by reconfiguring the width of their vector engines; whereas V1 implemented SVE(1) using a 2 pipeline 256-bit SIMD, V2 moves to 4 pipes of 128-bit SIMDs. The net result is that the cumulative SIMD width of the V2 is not any wider than V1, but the execution flow has changed to process a larger number of smaller vectors in parallel. This change makes the SIMD pipeline width identical to Arm’s Cortex parts (which are all 128-bit, the minimum size for SVE2), but it does mean that Arm is no longer taking full advantage of the scalable part of SVE by using larger SIMDs. I expect we’ll find out why Arm is taking this route once they do a full V2 deep dive, as I’m curious whether this is purely an efficiency play or something more akin to homogenizing designs across the Arm ecosystem.

Past that, it’s likely worth noting that while Arm’s presentation slides put bfloat16 and int8 matmul down as features, these are not new features. Still, Arm is promising that V2’s SIMD processing will provide microarchitecture efficiency improvements over the V1.

More broadly, V2 will also be introducing larger L2 cache sizes. The V2 design supports up to 2MB of private L2 cache per core, double the maximum size of V1. V2 will also be introducing further improvements to Arm’s integer processing performance, though the company isn’t going into further detail at this point. From an architectural standpoint, the V1 borrowed a fair bit from the Cortex-X1 CPU design, and it wouldn’t be too surprising if that was once again the case for the V2, borrowing from the X2. In which case consumer chips like the Snapdragon 8 Gen1 and Dimensity 9000 should provide a loose reference on what to expect.

For the Demeter platform Arm will be reusing their CMN-700 mesh fabric, which was first introduced for the V1 generation. CMN-700 is still a modern mesh design with support for up to 144 nodes in a 12x12 configuration, and is suitable for interfacing with DDR5 memory as well as PCIe 5/CXL 2 for I/O. As a result, strictly speaking the V2 isn’t bringing anything new at the fabric level – even the 512MB of SLC could be done with a V1 + CMN-700 setup – but this does mean that the CMN-700 mesh and its features is now a baseline moving forward with V2.

The Neoverse V2 core, in turn, is going to be the cornerstone of the upcoming generation of high-performance Arm server CPUs. The de facto flagship here will be NVIDIA’s Grace CPU, which will be one of the first (if not the first) V2 design to ship in 2023. NVIDIA had previously announced that Grace would be based on a Neoverse design, so this week’s announcement from Arm finally confirms the long-held suspicion that Grace would be based on the next-generation Neoverse V core.

NVIDIA, for its part, has their fall GTC event scheduled to take place in just a few days. So it’s likely we’ll hear a bit more about Grace and its Neoverse V2 underpinnings as NVIDIA seeks to promote the chip ahead of its release next year.

Neoverse E2: Cortex-A510 For Use With N2

Alongside the Neoverse V2 announcement, Arm is also using this week’s briefing to announce the Neoverse E2 platform. Unlike the V2 reveal, this is a much smaller scale announcement, and Arm is only offering a handful of technical details. Ultimately, E2’s day in the sun will be coming a bit later on.

That said, the E2 platform is being delivered to partners with an eye towards interoperability with the existing N2 platform. For this, Arm has paired the Cortex-A510 CPU, Arm’s little/high-efficiency Cortex CPU core, and paired that with the CMN-700 mesh. This is intended to give server operators/vendors further flexibility by providing an alternative CPU core to the N2, while still offering the modern I/O and memory features of Arm’s mesh. Underscoring this, the E2 system backplane is even compatible with the N2 backplane.

Neoverse Next: Poseidon, N-Next, and E-Next

Finally, Arm’s announcement this week provides a glimpse at the company’s future roadmap for all three Neoverse platforms, where, unsurprisingly, Arm is working on updated versions of each of the platforms.

Notably, all three platforms call for adding PCIe 6 support as well as CXL 3.0 support. This would come from the next iteration of Arm’s CMN mesh network, which as Arm already does today, is shared between all three platforms.

Meanwhile, it’s interesting to see the Poseidon name once again pop up in Arm’s roadmaps. Going back to Arm’s very first Neoverse roadmap, Poseidon was the name attached to Arm’s 5mn/2021 platform, a spot since taken by N2 and V1/V2 in various forms. With V2 not landing in hardware until 2023, Poseidon/V3 is still years off, but there’s likely some significance to Arm keeping the codename (such as new microarchitecture).

But first out of the gate will be the N-Next platform – the presumable Neoverse N3. With the Neoverse N platform a generation ahead of the rest (N2 was first announced in 2020), it’ll be the next platform due for a refresh. N3 is due to be available to partners in 2023, with Arm broadly touting generational performance and efficiency improvements.

Comments Locked

39 Comments

View All Comments

  • name99 - Thursday, September 15, 2022 - link

    Someone doesn't understand the difference between single-threaded and throughput computing, or what cloud providers want from their chips...
  • michael2k - Thursday, September 15, 2022 - link

    They can’t do one without also doing the other you realize?
  • lemurbutton - Thursday, September 15, 2022 - link

    That's funny because Apple is significantly ahead of AMD. Miles ahead. Probably 3-4 generations of improvements ahead.
  • Kangal - Friday, September 16, 2022 - link

    That's interesting way to put it. The best junction point to compare these chipsets is in the 10W range. So things like large tablets, thin notebooks, small laptops, TV Boxes and Mini PCs (all passive cooled). Here's what that looks like:

    SiFive FU740
    RockChip RK3588
    MediaTek K-1380
    Qualcomm 8CXg3
    Apple M1
    AMD r7-6800u
    Intel i7-1265u
  • mode_13h - Saturday, September 17, 2022 - link

    That list spans a massive range of price points and application targets. While a comparison would be interesting, there's a limited amount it could tell you, due to some using rather older IP and process nodes than others. Not coincidentally, those also tend to be the cheaper ones.
  • mode_13h - Saturday, September 17, 2022 - link

    I started confirming a few things for myself, and thought I'd share.

    * SiFive U740 - 4x 2-way in-order core @ 28 nm
    * RockChip RK3588 - 4x Cortex-A76 @ 8 nm LP
    * MediaTek K-1380 - 4x A78 + 4x A55 @ 6 nm
    * Qualcomm 8CX Gen3 - 4x X1 + 4x A78 @ 5 nm
    * Apple M1 - 4x Firestorm + 4x Ice Storm @ 5 nm
    * AMD R7 6800U - 8x Zen3+ @ 6 nm
    * Intel i7-1265U - 2x Golden Cove + 8x Gracemont @ Intel 7

    SiFive and Rockchip don't even belong in that list. Also, the tray price of the Intel CPU is probably 2-3x that of the Kompanio 1380, so it's rather out-of-place as well.

    That narrows it down to the usual suspects: Qualcomm, Intel, AMD, and Apple. However, since the X1 is basically just a beefed up A78, I'd argue even the 8CX Gen 3 is out of place.
  • Kangal - Sunday, September 18, 2022 - link

    That's funny, because the list actually started with Intel, AMD, and Apple. Then I remembered the Qualcomm and thought to look at alternatives.

    The most out of place option on that list is the RockChip, because every other chipset is the best of their category in some way. The SiFive is the best of RISC-V that is built and not theoretical. The MediaTek is the best Cortex-A / ARMv8 you can buy. The Qualcomm is the best Cortex-X / ARMv9 you can buy. Apple is its own thing. Intel has the fastest single-core performance. ARM has the best overall x86 performance.

    So really it kind of boils down to Apple vs AMD, or rather can AMD catch up in the next 2-years? But that's a meaningless question since they don't run the same software or same code. But can give us a hint of what's possible out there. Maybe a proper optimised Windows10 Pro running on an ARMv9 Qualcomm chipset with Nuvia cores? Then compare web browsing, regular computing, rendering, and gaming between Native Programs to an AMD Ryzen ultrabook. I suspect the Qualcomm will eventually overtake the performance point AND do so at lower power. Just as long as you avoid the legacy stuff, or unoptimised/rushed ports from x86-Windows to UWP-Windows.
  • mode_13h - Monday, September 19, 2022 - link

    > The MediaTek is the best Cortex-A / ARMv8 you can buy.
    > The Qualcomm is the best Cortex-X / ARMv9 you can buy.

    That's an artificial distinction between A-series and X-series.

    Also, 8CX Gen3 is still ARMv8, as hinted by the fact that it includes A78 cores. It's the X2 cores which are ARMv9.

    > can AMD catch up in the next 2-years?

    In what sense? Perf/W is the only area where Apple is significantly ahead. It's an important area, but Zen 4 seems to trounce M2 in single-thread performance.

    > I suspect the Qualcomm will eventually overtake the performance point
    > AND do so at lower power.

    I doubt Qualcomm will ever beat Apple on performance, and that's mainly because Apple's vertical integration lets them use larger dies with more cache. Qualcomm has to worry about the BOM price of their chips and how it compares to their rivals, while Apple only has to worry about the final product price.
  • Kangal - Wednesday, September 21, 2022 - link

    Qualcomm doesn't have to worry about the BOM. They have an intrinsic advantage to x86 chipsets that are usually larger. However, AMD has an advantage with the chiplet design, but they too have the same market/BOM to consider so it's a wash. Intel meanwhile is the market leader and they set the tray price really high, and they have been bleeding money from their fabrication process. Not to mention both x86 companies spend a lot more on R&D, whilst a generic ARM Licence is cheap. All in all, QC can afford to blow the budget and still undercut the competitors. And that's how it should be, those laptops should be priced cheaper due to lacking a critical/useful feature which is backwards compatibility.

    No, I didn't mean Apple.

    I meant now on Windows it should be pretty competitive already between the QC 8CXg3 (4x Big + 4x Medium), versus Intel i7-1265u (2x HUGE + 8x Medium), versus AMD r7-6800u (8x Big). When talking about a passively cooled, thin laptop, at the 10W power level.

    I managed to find the GeekBench 5.4 results, which are interesting:
    QC: 1100, 5000
    AMD: 1500, 9000
    Intel: 1700, 6000

    So that's in the current "Windows 10" era, with semi-optimised code, and a subpar design from Qualcomm who have been a joke. They had an exclusivity contract with Microsoft, and have been dragging their feet. Since it has recently elapsed, they're only beginning to start competing now.

    In the near future with "Windows 12" we should see Applications become more evenly optimised between the architectures. That's when ARM will probably flex its advantages. And we might finally get to see those Apple A13 Cores, I mean Nuvia Cores, running on the platform. They've been announced like for 4-years or something, and been perpetually delayed with a new redesign to fit into ARMv9 ISA. I feel like with the delayed Gen-2 ARMv9 cores from the European Team (Cortex-A730), we should see big improvements. With its derivative (Cortex-X4), there might not be much or any advantage to the Custom Nuvia cores anymore.
  • smalM - Thursday, September 15, 2022 - link

    "Past that, it’s likely worth noting that while Arm’s presentation slides put bfloat16 and int8 matmul down as features, these are not new features."
    They may not be new, but they are also not part of ARMv9.0, so they got mentioned separately.

Log in

Don't have an account? Sign up now