Memory Subsystem & Latency

Usually, the first concern of a SoC design, is that it requires that it performs well in terms of its data fabric and properly giving its IP blocks access to the caches and DRAM of the system within good latency metrics, as latency, especially on the CPU side, is directly proportional to the end-result performance under many workloads.

The Google Tensor, is both similar, but different to the Exynos chips in this regard. Google does however fundamentally change how the internal fabric of the chip is set up in terms of various buses and interconnects, so we do expect some differences.

First off, we have to mention that many of the latency patterns here are still quite a broken due to the new Arm temporal prefetchers that were introduced with the Cortex-X1 and A78 series CPUs – please just pay attention to the orange “Full Random RT” curve which bypasses these.

There’s a couple of things to see here, let’s start at the CPU side, where we see the X1 cores of the Tensor chip being configured with 1MB of L2, which comes in contrast with the smaller 512KB of the Exynos 2100, but in line with what we see on the Snapdragon 888.

The second thing to note, is that it looks like the Tensor’s DRAM latency isn’t good, and showcases a considerable regression compared to the Exynos 2100, which in turn was quite worse off than the Snapdragon 888. While the measurements are correct in what they’re measuring, the problem is a bit more complex in the way that Google is operating the memory controllers on the Google Tensor. For the CPUs, Google is tying the MCs and DRAM speed based on performance counters of the CPUs and the actual workload IPC as well as memory stall % of the cores, which is different to the way Samsung runs things which are more transactional utilisation rate of the memory controllers. I’m not sure of the high memory latency figures of the CPUs are caused by this, or rather by simply having a higher latency fabric within the SoC as I wasn’t able to confirm the runtime operational frequencies of the memory during the tests on this unrooted device. However, it’s a topic which we’ll see brought up a few more times in the next few pages, especially on the CPU performance evaluation of things.

The Cortex-A76 view of things looks more normal in terms of latencies as things don’t get impacted by the temporal prefetchers, still, the latencies here are significantly higher than on competitor SoCs, on all patterns.

What I found weird, was that the L3 latencies of the Tensor SoC also look to be quite high, above that of the Exynos 2100 and Snapdragon 888 by quite a noticeable margin. I noted that one weird thing about the Tensor SoC, is that Google didn’t give the DSU and the L3 cache of the CPU cluster a dedicated clock plane, rather tying it to the frequency of the Cortex-A55 cores. The odd thing here is that, even if the X1 or A76 cores are under full load, the A55 cores as well as the L3 are still running at lower frequencies. The same scenario on the Exynos or Snapdragon chip would raise the frequency of the L3. This behaviour and aspect of the chip can be confirmed by running at dummy load on the Cortex-A55 cores in order to drive the L3 higher, which improves the figures on both the X1 and A76 cores.

The system level cache is visible in the latency hump starting at around 11-13MB (1MB L2 + 4MB L3 + 8MB SLC). I’m not showing it in the graphs here, but memory bandwidth on normal accesses on the Google chip is also slower than on the Exynos, but I think I do see more fabric bandwidth when doing things such as modifying individual cache lines – one of the reasons I think the SLC architecture is different than what’s on the Exynos 2100.

The A55 cores on the Google Tensor have 128KB of L2 cache. What’s interesting here is that because the L3 is on the same clock plane as the Cortex-A55 cores, and it runs at the same higher frequencies, is that the Tensor’s A55s have the lowest L3 latencies of the all the SoCs, as they do without an asynchronous clock bridge between the blocks. Like on the Exynos, there’s some sort of increase at 2MB, something we don’t see on the Snapdragon 888, and I think is related to how the L3 is implemented on the chips.

Overall, the Tensor SoC is quite different here in how it’s operated, and there’s some key behaviours that we’ll have to keep in mind for the performance evaluation part.

Introduction - Custom or Semi-Custom? CPU Performance & Power
Comments Locked


View All Comments

  • melgross - Wednesday, November 3, 2021 - link

    Apple couldn’t integrate Qualcomm’s modems in their own chips because Qualcomm doesn’t allow that. They only allow the integration of their modems into their own SoC. It’s one reason why Apple wasn’t happy with them, other than the overcharging Qualcomm has been doing to Apple, and everyone else, by forcing the licensing of IP they didn’t use.
  • ChrisGX - Thursday, November 4, 2021 - link

    Yes, but all that conjecture hasn't been confirmed by any reputable source. And, the statements by Phil Carmack and Monika Gupta indicate Google has been optimising for power (most of all) and performance (to a lesser degree) rather than area. We end up back at the same place, using the A76 cores just doesn't make a lot of sense.

    Also, the A78 is perhaps 30% larger than the A76 (on a common silicon process) whereas, I think the X1 is about twice the size of the A76. I'm not sure what the implications of all that is for wafer economics but I'm pretty sure the reason that Tensor will probably end up suffering some die bloat (compared to upper echelon ARM SoCs from past years) despite the dense 5nm silicon process is the design decision to use two of those large X1 cores (a decision that Andrei seems perplexed by).
  • Raqia - Tuesday, November 2, 2021 - link

    The Google TPU only trades blows with the Qualcomm Hexagon 780 with the exception Mobile BERT. It's not an especially impressive first showing given that this is Google's centerpiece, and it's also unclear what the energy efficiency of this processor is relative to the competition. It's good there's competition though; at the phone level, software is somewhat differentiated and pricing is competitive.
  • webdoctors - Tuesday, November 2, 2021 - link

    Even if the performance isn't impressive, the big deal is guaranteed SW updates. Look at the Nvidia Shield, it came out in 2015 and its still getting the latest Android updates/OS! No other product has been updated for so long, 6 YEARS!

    Now that Google owns the SoC they have full access to the SoC driver source code so should be able to support the SoC forever, or at least ~10 years....not reliant on Qualcomm's 3 yr support term etc.
  • BlueScreenJunky - Tuesday, November 2, 2021 - link

    Yeah, except they only guarantee 3 years of software update and 5 years of security updates, which is really a shame if you ask me.

    If they could have guaranteed 5 years of OS updates from the start it would have been a very strong selling point. Especially since the difference between each generation becomes smaller every year, I could see people keeping a Pixel 6 for well over 3 years... How cool would that be to keep a $599 for 5 years and still run the latest android version ?
  • webdoctors - Tuesday, November 2, 2021 - link

    I agree:

    They should've just guaranteed 5 years for SW updates. Based off the pixel 3 being guaranteed for 3 yrs and than this month dropping security updates for Pixel 3 from their list, they're serious about guaranteeing being the maximum support they'll provide which is unfortunate. Maybe they'll update it this year cause that seems like a big hole.
  • TheinsanegamerN - Tuesday, November 2, 2021 - link

    Why? What new features do you NEED in your phone? Android stopped evolving with 9, iOS with about version 11. The newest OSes dont do anything spectacular the old ones didnt do.

    You're getting 5 years of security updates and dont have apps tied to OS version like apple, giving the pixel a much longer service life then any other phone.
  • tipoo - Tuesday, November 2, 2021 - link

    They're saying 3 years of OS updates, a far shot from 10. 5 years of security updates, which is a start, but owning their supposed own SoC they should have shot for 5 of OS.
  • BillBear - Wednesday, November 3, 2021 - link

    After all the build up on "We're going to have our own chips now so we can support them without interference from Qualcomm", three years of updates is seriously underwhelming.

    Apple has six year old phones running the current OS and the eight year old iPhone 5s got another security update a month ago.

    Google needs to seriously step up their game.
  • melgross - Friday, November 5, 2021 - link

    All we know now about software updates is that it will get five years of SECURITY updates, nothing about OS updates was stated, as far as I see. If that’s true, they Google may still just offer three years. Even now, Qualcomm allows for four years of OS updates, but not even Google has taken advantage of it. So nothing may change there.

Log in

Don't have an account? Sign up now