Qualcomm Previews Snapdragon X Elite SoC: Oryon CPU Starts in Laptops
by Ryan Smith on October 24, 2023 3:00 PM ESTWhile Qualcomm has become wildly successful in the Arm SoC market for Android smartphones, their efforts to parlay that into success in other markets has eluded them so far. The company has produced several generations of chips for Windows-on-Arm laptops, and while each has incrementally improved on matters, it’s not been enough to dislodge a highly dominant Intel. And while the lack of success of Windows-on-Arm is far from solely being Qualcomm’s fault – there’s a lot to be said for the OS and software – silicon has certainly played a part. To make serious inroads on the market, it’s not enough to produce incrementally better chips – Qualcomm needs to make a major leap in performance.
Now, after nearly three years of hard work, Qualcomm is getting ready to do just that. This morning, the company is previewing their upcoming Snapdragon X Elite SoC, their next-generation Arm SoC designed for Windows devices. Based on a brand-new Arm CPU core design from their Nuvia subsidiary dubbed “Oryon”, the Snapdragon X Elite is to be the tip of the iceberg for a new generation of Qualcom SoC designs. Not only is it the heart and soul of Qualcomm’s most important Windows-on-Arm SoC to date, but it will eventually be in smartphones and a whole lot more.
But we’re getting ahead of ourselves. For now let’s focus on the Snapdragon X Elite SoC and the Oryon cores underpinning it.
While this morning’s announcement from Qualcomm is far from a deep dive on the hardware, it’s our first look at what will be Qualcomm’s flagship SoC, and the new CPU cores within it. With a projected launch date of mid-2024, the first laptops based on the SoC are still several months away from hitting retail shelves – and about a year delayed overall. None the less, Qualcomm has finished their silicon development work, and with the chip’s specifications locked down, the company is now on to polishing things for a launch next year.
The Oryon CPU cores within the Snapdragon X Elite are the culmination of Qualcomm’s Nuvia acquisition from early 2021, and an even longer period of work for the Nuvia team. The ambition of the team, and the importance of the custom Arm architecture CPU cores, cannot be overstated. So the Snapdragon X Elite is going to be an interesting chip on multiple levels, as it sets the pace for the next generation of Qualcomm chip designs.
Snapdragon Compute (Windows-on-Arm) Silicon | ||||
AnandTech | Snapdragon X Elite | Snapdragon 8cx Gen 3 |
Snapdragon 8cx Gen 2 |
Snapdragon 8cx Gen 1 |
Prime Cores | 12x Oryon 3.80 GHz 2C Turbo: 4.3GHz |
4x C-X1 3.00 GHz |
4 x C-A76 3.15 GHz |
4 x C-A76 2.84 GHz |
Efficiency Cores | N/A | 4x C-A78 2.40 GHz |
4 x C-A55 1.80 GHz |
4 x C-A55 1.80 GHz |
GPU | Adreno SD X Elite 4.6 TFLOPS |
Adreno 8cx Gen 3 |
Adreno 690 | Adreno 680 |
NPU | Hexagon 45 TOPS (INT8) |
Hexagon 8cx Gen 3 15 TOPS |
Hexagon 690 9 TOPS |
Hexagon 690 9 TOPS |
Memory | 8 x 16-bit LPDDR5x-8533 136GB/sec |
8 x 16-bit LPDDR4x-4266 86.3 GB/sec |
8 x 16-bit LPDDR4x-4266 86.3 GB/sec |
8 x 16-bit LPDDR4x-4266 86.3 GB.sec |
Wi-Fi | Wi-FI 7 + BE 5.4 (Discrete) |
Wi-Fi 6E + BT 5.1 | Wi-Fi 6 + BT 5.1 | Wi-Fi 5 + BT 5.0 |
Modem | Snapdragon X65 (Discrete) |
Snapdragon X55/X62/X65 (Discrete) |
Snapdragon X55/X24 (Discrete) |
Snapdragon X24 (Discrete) |
Process | 4nm | Samsung 5LPE | TSMC N7 | TSMC N7 |
Starting with a high-level look at the chip, the Snapdragon X Elite is a high-performance SoC designed to power Windows-on-Arm laptops. Qualcomm isn’t listing any official TDPs, but the company has told us that the Elite is designed to scale across a “broad range” of thermal designs. Active cooling will be needed to get the most out of the Elite, but according to Qualcomm, passive/fanless designs are possible as well, and we should expect to see some retail devices designed as such.
Qualcomm is fabbing the chip on an unspecified 4nm process. Given their previous performance issues with Samsung’s 4nm line, it’s a very safe bet that they’re building this chip at TSMC – possibly using the N4P line. The silicon itself is a traditional monolithic die, so there is no use of chiplets or other advanced packaging here (though the wireless radios are discrete).
CPU: Oryon By The Dozen
The star of the show (if you’ll forgive the pun) is Oryon, Qualcomm’s new custom-designed Arm CPU core. Designed by the Nuvia team that Qualcomm acquired in 2021, Oryon is the first high-performance, fully-custom Arm CPU core created by Qualcomm in several years. And following multiple generations of lackluster Snapdragon Compute SoCs built out of Arm Cortex-A/X designs and functionally bigger versions of Qualcomm’s mobile SoCs, Oryon marks a major change in direction for Qualcomm.
Being that this is a preview, there are no significant architectural details to share on Oryon at this time. We don’t know the width, or various buffer sizes, execution ports, etc. But what we do know is that Qualcomm didn’t aim low with this SoC – the Nuvia team was working on a server-grade CPU core prior to their acquisition, and that kind of aggressive design has carried over into Oryon as well. Which, after all, was one of the major goals of Qualcomm’s acquisition, as they have desired a high performance CPU core to push them ahead of the other laptop (and eventually mobile) chip makers.
The Snapdragon X Elite SoC ships with 12 Oryon CPU cores – and that’s it. Unlike Qualcomm’s 8cx family of designs, there are no distinct “efficiency” and “performance” cores based on different microarchitectures; this is a homogenous CPU design, more akin to traditional PC processors. This means that Oryon needs to pull double duty, excelling in performance in heavy workloads without chewing up a bunch of power in light workloads.
The Oryon CPU cores are broken up into three clusters of 4 cores each. We’re still waiting on further technical details, of course, but it’s a safe assumption that each cluster is on its own power rail, so that unneeded clusters can be powered down when only a handful of cores are called for.
Just on this basis alone, Snapdragon X Elite looks like a far more potent performer than the 8CX chips it replaces. The 8cx Gen 3 offered just 4 performance cores (Cortex-X1) and another 4 eficiency cores (Cortex-A78), so Snapdragon X Elite will hit the streets with 50% more CPU cores never mind the higher performance of those cores. For a laptop chip, Qualcomm is throwing a lot of CPU cores at the matter.
With regards to clockspeeds, in an all-core turbo workload, all 12 Oryon CPU cores can hit run at up to 3.8GHz, power and thermal headroom permitting. Meanwhile in lighter workloads, the chip supports turboing up to 4.3GHz on 2 cores. Qualcomm’s slide on this matter shows a core from each cluster, but it’s unclear whether this is some kind of prime/favored core in action (where only certain cores are designed/validated for those speeds) or if it’s simply a stylistic choice.
Either way, Qualcomm is aiming to turbo to relatively high clockspeeds for their laptop chip, a notable distinction from their much more modestly clocked 8CX chips. While high clockspeeds alone do not make for a fast chip, one of the performance bottlenecks the 8CX chips were their pokey clockspeeds, so if Oryon offers as high an IPC rate as we suspect it will, then this would go a long way towards boosting Qualcomm’s CPU performance to compete with the industry’s strongest players.
Memory: 128-bit LPDDR5x
Feeding the beastly Oryon CPU cores (as well as the rest of the chip) is a 128-bit LPDDR5x memory bus. This is less remarkable than the CPU side of the chip, but it’s important to note all the same. With the previous 8CX chips only supporting LPDDR4x, this brings Qualcomm back to parity with the latest PC chips in terms of memory technology support. And with supported data rates as high as LPDDR5x-8533, this will give Qualcomm one of the fastest memory controllers on the market.
Qualcomm is also quoting a total of 42MB of cache in the system sitting between the various processor blocks and system memory. Given the explicit mention of “total cache”, this is almost certainly L2 + L3. Previous Qualcomm designs have offered a 6MB shared L3 (last level) cache. If that’s the case again here, then that would mean there’s 3MB of L2 cache available for each CPU core – or some permutation thereof.
GPU: Latest Generation Adreno
On the graphics side of matters, Snapdragon X Elite incorporates Qualcomm’s latest generation Adreno GPU. As is typical for Qualcomm in these matters, the company is saying virtually nothing about the architecture employed here, though it goes without saying that this is the latest and greatest iteration of Qualcomm’s in-house GPU design.
From a feature perspective, this is a DirectX 12-class GPU with ray tracing support, mirroring the capabilities Qualcomm introduced with last year’s Snapdragon 8 Gen 2 mobile SoC. Within the Windows ecosystem, it will almost certainly qualify as a DirectX 12 Ultimate (feature level 12_2) design.
Qualcomm is quoting a single throughput figure for the design: 4.6 TFLOPS at an unspecified bit depth/format (we’d guess FP32). Qualcomm has not previously disclosed similar figures for the 8CX chips, so it’s hard to say how this will compare. Or even how it will compare to other integrated GPUs, since there’s a lot more to real-world GPU performance than pure FLOPS.
The display controller portion of the GPU offers support for up to 4 DisplayPort displays. Besides an internal display for the laptop, it can drive a further 3 external displays (all DP 1.4), with one output being 5K capable, while the rest are 4K.
Finally, the SoC is getting Qualcomm’s latest video processing block (VPU) as well. This latest design not only support AV1 decoding, but in a first for a Qualcomm SoC, AV1 encoding as well.
NPU: Hitting Hard with Hexagon
Next to the use of Oryon CPU cores, Qualcomm’s other big bet with the Snapdragon X Elite SoC is on the AI/neural processing unit side of things with their latest generation Hexagon NPU. Qualcomm is expecting that AI use will continue to rapidly grow over the next few years, and that the next big push is going to be AI models running locally on users’ systems. So they have invested a significant amount of resources in bulking up their Hexagon NPU for this generation of chips (X Elite and 8 Gen 3).
The end result is a heavily revised NPU, which should greatly exceed the 8CX Gen 3’s NPU performance. Qualcomm is quoting 45 TOPS of performance here for modest precision INT8, whereas 8CX Gen 3 was previously quoted at 15 TOPS for an unspecified data format.
Unlike their CPU and GPU, Qualcomm is sharing some architectural details here about the NPU, and what they’ve done to boost its performance. The tensor accelerator block, used in the densest matrix math, is outright 2.5x faster than before. Backing that (and the rest of the NPU) is a 2x larger shared memory/cache (though Qualcomm is not disclosing the actual size). Qualcomm is targeting large language models (LLMs) in particular with this change, as these are notoriously memory bound; according to the company, the chip will have enough resources to run a 13 billion parameter Llama 2 model locally.
Qualcomm has also made some power delivery changes to help drive more performance/efficiency out of the NPU. The power-hungry tensor block is now on its own power rail, with the rest of the NPU sitting on a separate shared rail. The company has also made some further undisclosed improvements to how they handle micro-tiling of inferencing workloads, which directly impacts how well they can split up workloads to keep the various sub-blocks of the NPU as busy as possible while minimizing intermediate memory operations.
I/O: USB4, PCIe 4, & Discrete Wi-Fi 7
Rounding out the Snapdragon X Elite, let’s talk I/O.
For internal I/O, the SoC offers PCIe 4.0 connectivity for NVMe storage. Elsewhere, the company is using PCIe 3 to supply connectivity to their modem and Wi-Fi solutions. No mention has been made of whether there are any free PCIe lanes for further peripherals.
For external I/O, the SoC supports USB4. According to Qualcomm, it can drive up to 3 such Type-C ports, and there are also a pair of USB 3.2 Gen2 outputs, and a single USB 2.0 output for internal use.
As noted earlier, both Wi-Fi and the modem are discrete for this product. The chip is intended to be paired with Qualcomm’s FastConnect 7800 silicon in the form of an M.2 card. The 7800 is their latest-generation Wi-Fi 7 solution, with support for 4 spatial streams as well as Bluetooth 5.4. The modem pairing is the Snapdragon X65, a high-performance 5G modem which was also available for the 8CX Gen 3.
The fact that neither wireless system is integrated into the SoC is unusual for Qualcomm, but perhaps not too surprising since they want to bring the Elite to market ASAP. Integrating these modules would take further time, and as a laptop SoC, Qualcomm doesn’t need to be as space efficient. In any case, the official line from Qualcomm is that the discrete modem is for OEM flexibility – to give OEMs the option to either include a modem or not – though Qualcomm of course will be strongly encouraging OEMs to include one as a major feature differentiator of the platform.
Performance Claims
As we don’t have enough architectural details to make any meaningful performance projections, the best thing we have for now are Qualcomm’s vague comparisons to their competitors. This is also the closest thing Qualcomm has provided to energy efficiency data for the chip (though, as always, target clockspeeds for a SKU play a massive part there).
With 12 performance cores, Qualcomm is pushing hard on multi-threaded performance. In fact, multi-threaded performance is the only CPU performance comparisons Qualcomm makes, as there are no single-threaded comparisons to speak of. Make of that what you will.
Against what is implied to be an Intel 12 core mobile CPU design, Qualcomm is reporting that Snapdragon X Elite delivers 2x the multi-threaded performance in Geekbench 6. Or at iso-performance, they hit the same mark at one-third the power consumption.
Even against Intel’s best 14-core (H-class) chips, Qualcomm still reports that they lead by 60% in performance, and again are consuming one-third the power at iso-performance. Undoubtedly, a lot of this is down to the process node used, as TSMC N4 should be delivering a significant advantage over the Intel 7 process used on Intel’s current chips. This is also why the “moving target” aspect is so critical, as Snapdragon X Elite should be competing with the Intel 4 based Meteor Lake lineup by the time it launches next year.
More interesting, perhaps, is that Qualcomm is reporting a 50% multi-threaded performance advantage over an unspecified "Arm-based competitor,” This is meant to imply Apple, but depending on just how vague Qualcomm wishes to be, MediaTek does offer some Windows-on-Arm chips as well.
Qualcomm also expects to lead in GPU performance in 3DMark Wildlife Extreme. Which again, with a process node advantage and a tendency to build bigger iGPUs overall, is not surprising.
As always, these claims should be taken with a large grain of salt, especially for a platform that is still several months away from launching.
Snapdragon X Elite: Coming Mid-2024
Wrapping things up, Qualcomm is at this point putting the final touches on the Snapdragon X Elite. The company has deemed it one of their “most pivotal platform announcements in the company's recent history”, and for good reason. The Oryon CPU core being introduced here will eventually be at the heart of a good deal more products, so how competitive Oryon is will make or break Qualcomm’s next few generations of designs.
Devices based on the Snapdragon X Elite should be available in mid-2024. Which on that schedule, should see the Snapdragon X Elite competing against Intel’s Meteor Lake (Core Ultra) chips, AMD’s Phoenix chips (Ryzen Mobile 7000), and whatever the latest available iteration is of Apple’s M-series chips.
84 Comments
View All Comments
Ryan Smith - Tuesday, October 24, 2023 - link
"For some reason, AnandTech is using the non-final slides from the presentation"For what it's worth, Qualcomm silently updated the pre-brief deck multiple times. So the version I had, which was supposedly final and is what I used to file this story Sunday night, was in fact not. I've since updated the images in the article, but I'll have to tweak the text later when I have time.
NextGen_Gamer - Wednesday, October 25, 2023 - link
No problem Ryan! I was just really confused, because I read the AnandTech story first (it is always my first stop!) then Ars Technica second. So then I was like, why did Ryan have to do all of this speculation on what CPU this might be and what is implied here, when it is all laid out in the slides Ars Technica has? I would love for you to go back to this story now and fully update it with some comparisons of what you know of Intel's & Apple's current lineup. Like I said above, at first glance, it doesn't appear as though Oryon is going to reach M2 levels of IPC/efficiency. Though it is of course a league ahead of all other current ARM designs out there.thestryker - Tuesday, October 24, 2023 - link
The extra memory bandwidth from the Pro likely helps a fair bit (just guessing since adding 4P cores seems to increase their score by ~45%) which is part of the reason I'm disappointed Qualcomm stuck with a 128 bit bus maximum. Though it does make sense given the market they're aiming at as it'll be cheaper and they don't really have to compete with Apple.lionking80 - Wednesday, October 25, 2023 - link
"... Oryon seems to be losing: it is roughly the same performance, for slightly more power..."Your calculations are way off. The M2 Pro is only 21% faster than M2 in multicore (not 50% as you claim) despite having 50% more cores (12 vs 8). If Snapdragon X Elite has 50% better multicore performance than M2, then it is still SIGNIFICANTLY AHEAD of M2 Pro.
Geekbench 6 (multi-core)
M2 Pro 12222 (+21%)
Apple M2 10094
NextGen_Gamer - Wednesday, October 25, 2023 - link
@lionking80 - Majority of the M2 Pro scores I see are in the ~14000 range, with 12000 being only a few in the lowest. Let's go in the middle though, and say 13000. If Oryon is 50% better than base M2, that puts it at 15000 Geekbench 6 Multi-Core. Or, about ~15% better than M2 Pro/13000 average. That is 15% better though for consuming more power, and have 12 performance cores vs 8P+4E.Again, my point wasn't that Oryon isn't faster than M2 Pro. It absolutely is. It is that M2, as a CPU architecture, still seems to be king in the IPC/efficiency. Also, 15% is a close enough gap that may very well evaporate next week when M3 releases.
It is still a VERY good outing though, and puts Qualcomm way ahead of all other ARM designs. And will probably get Apple to actually start making big jumps in performance with their own chips again.
ChrisGX - Saturday, October 28, 2023 - link
@NextGen_Gamer We are reading the data very differently. There isn't anything in the benchmark numbers for the M2 (of whatever variety) that suggest to me that Apple's performance cores steal the win on efficiency even while being behind on performance.Admittedly, though, there isn't a lot of useful benchmark data to call upon at this point that illuminates the comparative Perf/W and power consumption picture beyond the CPU peak performance scenario that Geekbench focuses on. And, Geekbench numbers, by themselves, are hardly satisfactory. Additionally, there is a need to properly confirm Qualcomm's benchmark numbers. Still, the picture will become much clearer in the coming months. I would definitely like to see Anandtech act in computer users interests and do a thorough performance report on the X Elite.
Yes, the release of the M3 will be interesting. That will figure into an increasingly interesting picture for ARM computing going forward.
techconc - Thursday, October 26, 2023 - link
Look at the official listings on the Geekbench site.M2 in the Mac mini - 9742
M2 Pro in Mac mini - 14251 - 46% higher
If the X1 is 50% faster than M2, that would put it at 14613.... which is effectively the same.
Also, keep in mind that Qualcomm's numbers are typically higher than actual shipping products using their chips. They always show the best case scenario with some sort of reference device that has no thermal limitations, etc.
Speedfriend - Thursday, October 26, 2023 - link
From the slides, it would appear that multl thread on Geekbench is around 14500 (M2 x 1.5) which would put it in line with M2 Max with higher power consumption. However,the flip side is that it offers 15% better single-threaded at 30% less power and has the ability to ramp two cores to high clock speeds.For a first attempt, this is far better than I expected and will cause serious issues for Intel and AMD unless they pull something out of the bag
techconc - Thursday, October 26, 2023 - link
Yes, Qualcomm's presentation was intentionally misleading. For example, they compared single core performance to the M2 Max. Why not compare multi-core performance as well? For that matter, the base M2 has the same single core performance but I guess that comparison didn't sound as impressive. Worse, I've seen general news coverage (forget which channel) which parroted that claim and said this new chip is faster than an M2 Max. Well played Qualcomm marketing... it worked on non-technical types.ChrisGX - Thursday, October 26, 2023 - link
The comparison made in the keynote was clearly with the M2 MAX - Apple's highest performing chip. Qualcomm claimed the Snapdragon X Elite outperforms that chip (presumably on the Geekbench 6 single-thread benchmark). Furthermore, at ISO performance Qualcomm claimed the Snapdragon chip uses about 30% less power than the Apple chip (at the latter's peak performance).There is nothing unfair about that comparison. If further testing confirms Qualcomm's claims then the Oryon core will take the crown as the fastest ARM core and perhaps the fastest CPU core bare none. But, it will be efficient to boot.
The news is even worse for Intel than it is for Apple. Still, Apple somehow managed to alienate the designers of the Oryon core and I'm pretty sure the responsible parties over at Apple would be regretting that now.