Last week, Apple had unveiled their new generation MacBook Pro laptop series, a new range of flagship devices that bring with them significant updates to the company’s professional and power-user oriented user-base. The new devices particularly differentiate themselves in that they’re now powered by two new additional entries in Apple’s own silicon line-up, the M1 Pro and the M1 Max. We’ve covered the initial reveal in last week’s overview article of the two new chips, and today we’re getting the first glimpses of the performance we’re expected to see off the new silicon.

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors

Starting off with the M1 Pro, the smaller sibling of the two, the design appears to be a new implementation of the first generation M1 chip, but this time designed from the ground up to scale up larger and to more performance. The M1 Pro in our view is the more interesting of the two designs, as it offers mostly everything that power users will deem generationally important in terms of upgrades.

At the heart of the SoC we find a new 10-core CPU setup, in a 8+2 configuration, with there being 8 performance Firestorm cores and 2 efficiency Icestorm cores. We had indicated in our initial coverage that it appears that Apple’s new M1 Pro and Max chips is using a similar, if not the same generation CPU IP as on the M1, rather than updating things to the newer generation cores that are being used in the A15. We seemingly can confirm this, as we’re seeing no apparent changes in the cores compared to what we’ve discovered on the M1 chips.

The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz.

The two E-cores in the system clock at up to 2064MHz, and as opposed to the M1, there’s only two of them this time around, however, Apple still gives them their full 4MB of L2 cache, same as on the M1 and A-derivative chips.

One large feature of both chips is their much-increased memory bandwidth and interfaces – the M1 Pro features 256-bit LPDDR5 memory at 6400MT/s speeds, corresponding to 204GB/s bandwidth. This is significantly higher than the M1 at 68GB/s, and also generally higher than competitor laptop platforms which still rely on 128-bit interfaces.

We’ve been able to identify the “SLC”, or system level cache as we call it, to be falling in at 24MB for the M1 Pro, and 48MB on the M1 Max, a bit smaller than what we initially speculated, but makes sense given the SRAM die area – representing a 50% increase over the per-block SLC on the M1.

 

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors

Above the M1 Pro we have Apple’s second new M1 chip, the M1 Max. The M1 Max is essentially identical to the M1 Pro in terms of architecture and in many of its functional blocks – but what sets the Max apart is that Apple has equipped it with much larger GPU and media encode/decode complexes. Overall, Apple has doubled the number of GPU cores and media blocks, giving the M1 Max virtually twice the GPU and media performance.

The GPU and memory interfaces of the chip are by far the most differentiated aspects of the chip, instead of a 16-core GPU, Apple doubles things up to a 32-core unit. On the M1 Max which we tested for today, the GPU is running at up to 1296MHz  - quite fast for what we consider mobile IP, but still significantly slower than what we’ve seen from the conventional PC and console space where GPUs now can run up to around 2.5GHz.

Apple also doubles up on the memory interfaces, using a whopping 512-bit wide LPDDR5 memory subsystem – unheard of in an SoC and even rare amongst historical discrete GPU designs. This gives the chip a massive 408GB/s of bandwidth – how this bandwidth is accessible to the various IP blocks on the chip is one of the things we’ll be investigating today.

The memory controller caches are at 48MB in this chip, allowing for theoretically amplified memory bandwidth for various SoC blocks as well as reducing off-chip DRAM traffic, thus also reducing power and energy usage of the chip.

Apple’s die shot of the M1 Max was a bit weird initially in that we weren’t sure if it actually represents physical reality – especially on the bottom part of the chip we had noted that there appears to be a doubled up NPU – something Apple doesn’t officially disclose. A doubled up media engine makes sense as that’s part of the features of the chip, however until we can get a third-party die shot to confirm that this is indeed how the chip looks like, we’ll refrain from speculating further in this regard.

Huge Memory Bandwidth, but not for every Block
Comments Locked

493 Comments

View All Comments

  • Ryan Smith - Monday, October 25, 2021 - link

    Thanks. Fixed!
  • 5j3rul3 - Monday, October 25, 2021 - link

    It's amazing.

    Is there any analysis for promotion, M1 Max GPU ray tracing...?
  • dada_dave - Monday, October 25, 2021 - link

    Ray tracing is in Metal, but as of yet no GPU-hardware accelerated ray tracing yet
  • Kangal - Monday, October 25, 2021 - link

    Really impressive chip.
    I noted my satisfaction/dissatisfaction a whole year ago with the original Apple M1. I suggested that Apple should release a family of chipsets for their devices. It was mainly for being more competitive and having better product segmentation. This didn’t happen, and it looks like its only somewhat happening. Not to mention, they could've done this transition even earlier like a year or two ago. Also they could update their “chipset-family” with the subsequent architectural improvements per generation. For instance;

    Apple M1, ~1W, only small cores, 1cu GPU... for 2in watch, wearables
    Apple M10, ~3W, 2 large cores, 4cu GPU... for 5in phones, iPods
    Apple M20, ~5W, 3 large cores, 4cu GPU... for 7in phablets or Mini iPad
    Apple M30, ~7W, 4 large cores, 8cu GPU… for 9in tablet, ultra thin, fanless
    Apple M40, ~10W, 8 large cores, 8cu GPU… for 11in laptop, ultra thin, fanless
    Apple M50, ~15W, 8 large cores, 16cu GPU… for 14in laptop, thin, active cooled
    Apple M60, ~25W, 8 large cores, 32cu GPU… for 17in laptop, thick, active cooled
    Apple M70, ~45W, 16 large cores, 32cu GPU… for 23in iMac, thick, AC power
    Apple M80, ~85W, 16 large cores, 64cu GPU... for 31in iMac+, thicker, AC power
    Apple M90, ~115W, 32 large cores, 64cu GPU…. for Mac Pro, desktop, strong cooling

    …and after 1.5 years, they can move unto the next refined architecture/node, and repeat the cycle every 18 months). The naming could be pretty simple as well, for example; in 2020 it was M50, then in 2021 their new model is the M51, then it is M52, then M53, then M54, etc etc. This was the lineup that I had hoped for, kinda bummed, they didn't rush out the gate with such a strong lineup, and they possibly may not in the future.
  • rmullns08 - Monday, October 25, 2021 - link

    With how many SKUs Apple already has with the just the M1 Pro/Max configuration's it would likely be a supply chain nightmare to try to manage 10 CPUs as well.
  • gobaers - Monday, October 25, 2021 - link

    Page 4 should be "put succinctly" not "succulently." Even if we do appreciate water efficiency in our chip manufacturing process ;)
  • paulraphael - Monday, October 25, 2021 - link

    "Put succulently, the new M1 SoCs prove that Apple ...."
    A rare case of autocorrect improving an idea.
  • Hifihedgehog - Monday, October 25, 2021 - link

    Mmmunchy Krunchy Dee-licious.
  • GC2:CS - Monday, October 25, 2021 - link

    So hardware is on one hand much upgraded like in terms of memory architecture but on the other hand it is still a year old Firestorm icestorm GPU and NPU.
    I wonder if LPDDR5 is simply not suited for iPhones but seems strange the A15 gives some upgrade to everything while sticking with DDR4.

    24 and 48 MB system caches were shown by apple. For brief moment they labeled their M1Pro area with 24 little parts as system cache. Max doubles the same part. I just was not sure one little part of SM equals 1 MB.

    So making two independet clusters with their own L2 helps compared to a single 8C/24 MB cluster ?
    After all firestorm is about 5 W per core so it is probably easy to fit many of them in lets say upcoming 300 W desktop Apple silicon. The question is, is there a space for an even larger core than firestorm ? If a 5 W core is fast why not make a 20W core (even if less efficient) and put two of them into a desktop, along with few dozens firestorms ? Like make firestorm the little core in the desktop.
    Honestly i could not take an idea that next year we will have desktop PC with less powerfull main cores than in a phone (how could i flex on my friends then ?)

    While M1’s are fast we have the A15 with supposed large gains in CPU and GPU efficiency better NPU and 32 MB system cache already shiping. Seems like a good omen for those M2 generations ?
    So M2/Pro/Max will get up to 10/20/40 GPU cores 32/48/64 MB system cache 18/36/36 MB of L2 ?!?

    Apple Silicon lineup is getting confusing - not liking it much. A4-A15 is the best naming scheme for any piece of silicon I have ever seen. (Would be better if they started at A1).
  • StinkyPinky - Monday, October 25, 2021 - link

    Thanks for being the only place that actually did real world benchmarks. Some of these reviews around the web are god awful.

    Any chance you can do Civ 6? That always seems a good test of both CPU and GPU.

Log in

Don't have an account? Sign up now