CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

All three CPUs exhibit the same behaviour - one core seems to be given high priority, while the rest are not.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

From an idle frequency of 800 MHz, It takes ~16 ms for Intel to boost to the top frequency for both the i9 and the i5. The i7 was most of the way there, but took an addition 10 ms or so. 

Power Consumption: Caution on Core i9 CPU Tests: Office and Science
Comments Locked

279 Comments

View All Comments

  • ozzuneoj86 - Thursday, April 1, 2021 - link

    "Rocket Lake also gets you PCIe 4.0, however users might feel that is a small add-in when AMD has PCIe 4.0, lower power, and better general performance for the same price."

    If a time traveling tech journalist would have told us back in the Bulldozer days that Anandtech would be writing this sentence in 2021 in a nonchalant way (because AMD having better CPUs is the new normal), we wouldn't have believed him.
  • Hrel - Friday, April 2, 2021 - link

    Just in case anyone able to actually affect change reads these comments, I'm not even interested in these because the computer I built in 2014 has a 14nm processor too... albeit with DDR 3 RAM but come on, DDR4 isn't even much of a real world difference outside ultra specific niche scenarios.

    Intel, this is ridiculous, you're going to have been on the SAME NODE for a DECADE HERE!!!!

    Crying out loud 10nm has been around for longer than Intels 14nm, this is nuts!
  • James5mith - Saturday, April 3, 2021 - link

    " More and more NAS and routers are coming with one or more 2.5 GbE ports as standard"

    No, they most definitely are not. lol
  • Linustechtips12#6900xt - Monday, April 5, 2021 - link

    gotta say, love the arguments on page 9 lol
  • peevee - Monday, April 5, 2021 - link

    "the latest microcode from Intel should help increase performance and cache latency"

    Do we really want the increase in cache latency? ;) :)
  • 8 Cores is Enough - Wednesday, August 4, 2021 - link

    I just bought the 11900k with a z590 Gigabyte Aorous Pro Ax mobo and Samsung 980 pro 500GB ssd. This replaced my 9900k in a z390 Gigabyte Aurous Master with a 970 pro 512GB ssd.

    They're both 14nm node processors with 8c/16t and both overclocked, 5GHz all cores for 9900k and 5.2GHz all cores with up to 5.5GHz on one core via tiurbo modes on the 11900k.

    However, the 11900k outperforms the 9900k in every measure. In video encoding, which I do fairly often, it's twice as fast. In fact, the 11900k can comvert 3 videos at the same time each one as fast as my rtx 2070 super can do 1 video af a time.

    On UserBenchmark.com, my 11900k is the current record holder for fastest 11900k tested. It beats all the 10900k's even in the 64 thread server workload metric. It loses to the 5900x and 5950x in this one metric but clobbers them botb in the 1, 2, 4 and 8 core metrics.

    I wish I had a 5900x to test on Wondershare Uniconverter. I suspect my 11900k would match it given the 2X improvement over the 9900k, which was about 1/2 as fast as the 3950x in video comversion.

    I do a lot of video editing as well. Maybe on this workload an AMD 5900x or 5950x would beat the 11900k. It seems plausible so let's presume this and accept Ryzen 9 is most likely still best for video editing.

    But the cliam thaf being stuck on 14nm node means Intel RKL CPUs perform the same as Haswell or that they are even close does not make sense to me based on my experiences so far going from coffee lake refresh to RKL.

    The Rocket Lake CPUs are like the muscle cars of 1970. They are inefficient beasts that haul buttocks. They exist as a matter of circumstance and we may never see the likes of them again.

    Faster more efficient CPUs will be built but the 11th gen Intel CPUs will be remembered for being the back ported abominations they are: thirsty and fast with the software of 2021 which for the time being still favors single thread processing.

    If you play Kerbal Space Program then get an 11900k because that game is all about single thread performance and right now the 11900k beats all other CPUs at that.
  • Germanium - Thursday, September 2, 2021 - link

    My experimentation with my Rocket Lake Core I 11700k on my Asus Z590-A motherboard has shown me that it least on some samples AVX512 can be more efficient & cooler running than AVX2 at the same clock speed.

    I am running my sample at 4.4GHz both AVX512 & AVX2. When running Hand Brake there is nearly a 10 watt savings when running AVX512 as opposed to AVX2.

    Before anyone says Hand Brake does not use AVX512 & that is true out of the box but there is a setting script I found online to activate AVX512 on Hand Brake and it does work. It most be manually entered, no copy & paste available.

    With stock voltage settings at 4.2GHz using AVX2 at was drawing over 200 watts. With my settings I am able to run AVX512 at 4.4 GHz with peak wattage in Hand Brake of 185 watts. That was absolute peak wattage. It mostly ran between 170 to 180 watts. AVX2 runs about 10 watts more for slightly less performance at same clock speed.
  • Germanium - Thursday, September 2, 2021 - link

    Forgot to mention that on order to make AVX512 so efficient one must set the AVX Guard Band voltage Offset at or near 0 to bring the power to acceptable levels. Both AVX512 & AVX2 must be lowered. If AVX2 is not lowered at least same amount AVX512 setting will have little or no effect.
  • chane - Thursday, January 13, 2022 - link

    I hope my post is considered on topic

    Scenario 1: Without discrete graphics 1080p grade card, using on-chip graphics: Given the same core count (but below 10 cores), base and turbo frequencies and loaded with the same Cinebench and/or Handbrake test loads, would a Rocket lake Xeon w series processor run hotter, cooler or about the same as a Rocket Lake i family series processor with the same TDP spec?

    Scenario 2: As above but with 1080p grade discrete graphics card.

    Note: The Xeon processor pc will be using 16GB of ECC memory, however much that may impact heat and fan noise.

    Please advise.
    Thanks.

Log in

Don't have an account? Sign up now