The Ampere Altra Max Review: Pushing it to 128 Cores per Socket
by Andrei Frumusanu on October 7, 2021 8:00 AM EST- Posted in
- Servers
- Arm
- Neoverse N1
- Ampere
- Altra Max
SPEC - Multi-Threaded Performance - Subscores
We’re starting off with the multi-threaded/process SPEC CPU rate results. As usual, because there are not officially submitted scores to SPEC, we’re labelling the results as “estimates” as per the SPEC rules and license.
We compile the binaries with GCC 10.2 on their respective platforms, with simple -Ofast
optimisation flags and relevant architecture and machine tuning flags (-march/-mtune=Neoverse-n1 ; -march/-mtune=skylake-avx512 ; -march/-mtune=znver2
).
We’re focusing our comparisons between the new M128-30, the previous Q80-33, and AMD’s flagship EPYC 7763 and Intel’s new Xeon 8380. The Altra chips are running at 250W TDPs at respectively 128/80 cores, the EPYC at 280W and 64 cores, and the Xeon at 270W for 40 cores. The SMT systems have it enabled, and we’re running peak threads in these subscores.
In SPECint2017, we’re seeing two different result-sets for the new Altra Max system – either very large gains, or some more notable performance regressions.
Workloads such as 525.x264_r, 531.deepsjeng_r, 541.leela_r, and 548.exchange2_r, have one large commonality about them, and that is that they’re not very memory bandwidth hungry, and are able to keep most of their working sets within the caches. For the Altra Max, this means that it’s seeing performance increases from 38% to 45% - massive upgrades compared to the already impressive Q80-33.
The 45% increase in 548.exchange2_r is essentially almost perfect linear scaling with the core count and frequencies; although the M128-30 has 60% more cores, it’s also running at 10% lower frequencies, so 45% more theoretical throughput.
523.xlancbmk_r also isn’t very DRAM traffic heavy in traditional systems, however it has a larger working set than the other aforementioned workloads, and the smaller SLC size and increased core count don’t do it favours as it becomes resource contended. The same can be said of 502.gcc_r, which is also slower than the Q80-33.
505.mcf_r is the worst-case scenario, although memory latency sensitive, it also has somewhat higher bandwidth that can saturate a system at higher instance count, and adding cores here, due to the bandwidth curve of the system, has a negative impact on performance as the memory subsystem becomes more and more inefficient. The same workload with only 32 or 64 instances scores 83.71 or 101.82 respectively, much higher than what we’re seeing with 128 cores.
In the FP suite, we’re seeing a same differentiation between the M128-80 and the other systems. In anything that is more stressful on the memory subsystem, the new Mystique chip doesn’t do well at all, and most times regresses over the Q80-33.
In anything that’s simply execution bound, throwing in more execution power at the problem through more cores of course sees massive improvements. In many of these cases, the M128-30 can now claim a rather commanding lead over the competition Milan chip, and leaving even Intel’s new Ice Lake-SP in the dust due to the sheer core count and efficiency advantage.
60 Comments
View All Comments
mode_13h - Thursday, October 7, 2021 - link
> x86 still commands 99% of the server market.Depends on what you consider the "server market", but AWS is very rapidly switching over. Others will follow.
Lots of cloud compute just depends on density and power-efficiency. And here's where ARM has a real advantage.
Wilco1 - Thursday, October 7, 2021 - link
According to https://www.itjungle.com/2021/09/13/the-cacophony-... Arm server revenue has been 4-5% over the last few quarters.schujj07 - Friday, October 8, 2021 - link
Anything under 10% market share in the server world is basically considered a niche player. Right now AMD is over 10% so they are finally seen as an actual player in the market.Spunjji - Friday, October 8, 2021 - link
Pointing at current market share that resulted from a lack of viable ARM competition isn't a great argument for your prediction that ARM will not gain market share, especially when you're being presented with evidence of viable ARM competition.mode_13h - Thursday, October 7, 2021 - link
> Before AMD can disrupt Intel in the server,*before* ? This is already happening! You can clearly see it in AMD's server marketshare, as well as the price structure of Ice Lake.
> And now Intel is coming back with Saphire Rapids. Doesn't look good for AMD.
AMD has Genoa, V-Cache, and who knows what else in the pipeline. Oh, and they can also build an ARM core just as good as anyone (with the possible exceptions of Apple and Nuvia/Qualcomm).
yetanotherhuman - Friday, October 8, 2021 - link
Not even in slight agreement. Different architecture.eastcoast_pete - Thursday, October 7, 2021 - link
Thanks Andrei, great analysis! IMO, the biggest problem Ampere and other firms that develop server CPUs based on ARM designs is that their natural customers - large, cloud-type providers - pretty much all have their own, in-house designed ARM-based CPUs, and won't buy thousands of third party CPUs unless they do something their own can't do, or nowhere near as well. AWS, Google, MS, and Apple still buy x86 CPUs from Intel or AMD because there is a customer demand for those instances, but also try to shift as much as they can to their own, home-grown ARM server systems. In this regard, has anyone heard any updates about the ARM designs supposedly in development at MS? Maybe Ampere can get themselves bought out by them?name99 - Friday, October 8, 2021 - link
“own house-designed ARM-based CPU’s”?We obviously have Graviton. Apple seem a reasonable bet at some point. Maybe a large Chinese player.
Do we have any evidence (as opposed to hypotheses and rumors) of Google, Facebook, Microsoft, or most of China? Or other smaller but still large players like Yandex or Cloudflare?
Sivar - Thursday, October 7, 2021 - link
This is a proper old-school deep CPU review.vegemeister - Thursday, October 7, 2021 - link
Text says Intel Xeon 8380 is running at 205 W power limit, but the table says 270 W. Which is it? I assume 270 W like ARK says?