Fiji’s Layout

So what did AMD put in 8.9 billion transistors filling out 596mm2? The answer as it turns out is quite a bit of hardware, though at the same time perhaps not as much (or at least not in the ratios) as everyone was initially hoping for.

The overall logical layout of Fiji is rather close to Hawaii after accounting for the differences in the number of resource blocks and the change in memory. Or perhaps Tonga (R9 285) is the more apt comparison, since that’s AMD’s other GCN 1.2 GPU.

In either case the end result is quite a bit of shading power for Fiji. AMD has bumped up the CU count from 44 to 64, or to put this in terms of the number of ALUs/stream processors, it’s up from 2816 to a nice, round 4096 (2^12). As we discussed earlier FP64 performance has been significantly curtailed in the name of space efficiency, otherwise at Fury X’s stock clockspeed of 1050MHz, you’re looking at enough ALUs to push 8.6 TFLOPs of FP32 operations.

These 64 CUs in turn are laid out in a manner consistent with past GCN designs, with AMD retaining their overall Shader Engine organization. Sub-dividing the GPU into four parts, each shader engine possesses 1 geometry unit, 1 rasterizer unit, 4 render backends (for a total of 16 ROPs), and finally, one-quarter of the CUs, or 16 CUs per shader engine. The CUs in turn continue to be organized in groups of 4, with each group sharing a 16KB L1 scalar cache and 32KB L1 instruction cache. Meanwhile since Fiji’s CU count is once again a multiple of 16, this also does away with Hawaii’s oddball group of 3 CUs at the tail-end of each shader engine.

Looking at the broader picture, what AMD has done relative to Hawaii is to increase the number of CUs per shader engine, but not changing the number of shader engines themselves or the number of other resources available for each shader engine. At the time of the Hawaii launch AMD told us that the GCN 1.1 architecture had a maximum scalability of 4 shader engines, and Fiji’s implementation is consistent with that. While I don’t expect AMD will never go beyond 4 shader engines – there are always changes that can be made to increase scalability – given what we know of GCN 1.1’s limitations, it looks like AMD has not attempted to increase their limits with GCN 1.2. What this means is that Fiji is likely the largest possible implementation of GCN 1.2, with as many resources as the architecture can scale out to without more radical changes under the hood to support more scalability.

Along those lines, while shading performance is greatly increased over Hawaii, the rest of the front-end is very similar from a raw, theoretical point of view. The geometry processors, which as we mentioned before are organized to 1 per shader engine, just as was the case with Hawaii. With a 1 poly/clock limit here, Fiji has the same theoretical triangle throughput at Hawaii did, with real-world clockspeeds driving things up just a bit over the R9 290X. However as we discussed in our look at the GCN 1.2 architecture, AMD has made some significant under-the-hood changes to the geometry processor design for GCN 1.2/Fiji in order to boost their geometry efficiency, making Fiji’s geometry fornt-end faster and more efficient than Hawaii. As a result the theoretical performance may be unchanged, but in the real world Fiji is going to offer better geometry performance than Hawaii does.

Meanwhile the command processor/ACE structure remains unchanged from Hawaii. We’re still looking at a single graphics command processor paired up with 8 Asynchronous Compute Engines here, and if AMD has made any changes to this beyond what is necessary to support the GCN 1.2 feature set (e.g. context switching, virtualization, and FP16), then they have not disclosed it. AMD is expecting asynchronous shading to be increasingly popular in the coming years, especially in the case of VR, so Fiji’s front-end is well-geared towards the future AMD is planning for.

Moving on, let’s switch gears and talk about the back-end of the processor. There are some significant changes here due to HBM, as to be expected, but there are also some other changes going on as well that are not related to HBM.

Starting with the ROPs, the ROP situation for Fiji remains more or less unchanged from Hawaii. Hawaii shipped with 64 ROPs grouped in to 16 Render Backends (RBs), which at the time AMD told us was the most a 4 shader engine GCN GPU could support. And I suspect that limit is still in play here, leading to Fiji continuing to pack 64 ROPs. Given that AMD just went from 32 to 64 a generation ago, another jump seemed unlikely anyhow (despite earlier rumors to the contrary), but in the end I suspect that AMD had to consider architectural limits just as much as they had to consider performance tradeoffs of more ROPs versus more shaders.

In any case, the real story here isn’t the number of ROPs, but their overall performance. Relative to Hawaii, Fiji’s ROP performance is getting turbocharged for two major reasons. The first is GCN 1.2’s delta color compression, which significantly reduces the amount of memory bandwidth the ROPs consume. Since the ROPs are always memory bandwidth bottlenecked – and this was even more true on Hawaii as the ROP/bandwidth ratio fell relative to Tahiti – anything that reduces memory bandwidth needs can boost performance. We’ve seen this first-hand on R9 285, which with its 256-bit memory bus had no problem keeping up with (and even squeaking past) the 384-bit bus of the R9 280.

The other factor turbocharging Fiji’s ROPs is of course the HBM. In case GCN 1.2’s bandwidth savings were not enough, Fiji also just flat-out has quite a bit more memory bandwidth to play with. The R9 290X and its 5Gbps, 512-bit memory bus offered 320GB/sec, a value that for a GDDR5-based system has only just been overshadowed by the R9 390X. But with Fiji, the HBM configuration as implemented on the R9 Fury X gives AMD 512GB/sec, an increase of 192GB/sec, or 60%.

Now AMD did not just add 60% more memory bandwidth because they felt like it, but because they’re putting that memory bandwidth to good use. The ROPs would still gladly consume it all, and this doesn’t include all of the memory bandwidth consumed by the shaders, the geometry engines, and the other components of the GPU. GPU performance has long outpaced memory bandwidth improvements, and while HBM doesn’t erase any kind of conceptual deficit, it certainly eats into it. With such a significant increase in memory bandwidth and combined with GCN 1.2’s color compression technology, AMD’s effective memory bandwidth to their ROPs has more than doubled from Hawaii to Fiji, which will go a long way towards increasing ROP efficiency and real-world performance. And even if a task doesn’t compress well (e.g. compute) then there’s still 60% more memory bandwidth to work with. Half of a terabyte-per-second of memory bandwidth is simply an incredible amount to have for such a large pool of VRAM, since prior to this only GPU caches operated that quickly.

Speaking of caches, Fiji’s L2 cache has been upgraded as well. With Hawaii AMD shipped a 1MB cache, and now with Fiji that cache has been upgraded again to 2MB. Even with the increase in memory bandwidth, going to VRAM is still a relatively expensive operation, so trying to stay on-cache is beneficial up to a point, which is why AMD spent the additional transistors here to double the L2 cache. Both AMD and NVIDIA have gone with relatively large L2 caches in this latest round, and with their latest generation color compression technologies it makes a lot of sense; since the L2 cache can store color-compressed tiles, all of a sudden L2 caches are a good deal more useful and worth the space they consume.

Finally, we’ll get to HBM in a more detail in a bit, but let’s take a quick look at the HBM controller layout. With Fiji there are 8 HBM memory controllers, and each HBM controller in turn drives one-half of an HBM stack, meaning 2 controllers are necessary to drive a full stack. And while AMD’s logical diagram doesn’t illustrate it, Fiji is almost certainly wired such that each HBM memory controller is tightly coupled with 8 ROPs and 256KB of L2 cache. AMD has not announced any future Fiji products with less than 4GB of VRAM, so we’re not expecting any parts with disabled ROPs, but if they did that would give you an idea of how things would be disabled.

The Fiji GPU: Go Big or Go Home Power Efficiency: Putting A Lid On Fiji
Comments Locked

458 Comments

View All Comments

  • K_Space - Thursday, July 2, 2015 - link

    Between now and 2016 (preferably before the holiday season) I see AMD dropping the Fury X price and churning up better drivers; so it's not all too bleak. But it's still annoying that both of these could have been fixed before launch.
  • dragonsqrrl - Thursday, July 2, 2015 - link

    Yep, the Fury X is essentially vaporware at this point. It basically doesn't exist. Some tech journalists with inside information have estimated that fewer than 1000 were available for NA at launch. Definitely some supply issues to say the least, which I suspect is mostly due to the HBM.

    I have no idea why AMD hyped up Fiji so much prior to launch. In a sense they just made it that much more difficult for themselves. What kind of reaction were they expecting with rhetoric like "HBM has allowed us to create the fastest GPU in the world", along with some of the most cherry picked pre-launch internal benchmarks ever conceived? It just seems like they've given up and are only trying to engage their most zealous fanboys at this point.

    All that being said, I don't think Fury X is a terrible card. In fact I think it's the only card in AMDs current lineup even worth considering. But unfortunately for AMD, the 980Ti is the superior card right now in practically every way.
  • chizow - Thursday, July 2, 2015 - link

    Yep, it is almost as if they set themselves up to fail, but now it makes more sense in terms of their timing and delivery. They basically used Fury X to prop up their Rebrandeon stack of 300 series, as they needed a flagship launch with Fury X in the hopes it would lift all sails in the eyes of the public. We know Rebrandeon 300 series was set in stone and ready to go as far back as Financial Analsyts Day (Hi again all AMD fanboys who said I was wrong) with early image leaks and drivers confirming this as well.

    But Fury X wasn't ready. Not enough chips and cards ready, cooler still showing problems, limited worldwide launch (I think 10K max globally). I think AMD wanted to say and show something at Computex but quickly changed course once it was known Nvidia would be HARD launching the 980Ti at Computex.

    980Ti launch changed the narrative completely, and while AMD couldn't change course on what they planned to do with the R9 Rebrandeon 300 series and a new "Ultra premium" label Fury X using Fiji, they were forced to cut prices significantly.

    In reality, at these price points and with Fury X's relative performance, they really should've just named it R9 390X WCE and called it a day, but I think they were just blindsided by the 980Ti not just in performance being so close to Titan X, but also in price. No way they would've thought Nvidia would ask just $650 for 97% of Titan X's performance.

    So yeah, brilliant moves by Nvidia, they've done just about everything right and executed flawlessly with Maxwell Mk2 since they surprised everyone with the 970/980 launch last year. All the song and dance by AMD leading up to Fury X was just that, an attempt to impress investors, tech press, loyal fans, but wow that must have been hard for them to get up on stage and say and do the things they did knowing they didn't have the card in hand to back up those claims.
  • kn00tcn - Thursday, July 2, 2015 - link

    do you want a nobel prize after all that multiple post gloating? you're not the one leaking, we already knew fiji was the only new gpu, i never saw any 'fanboys' as you call them saying the 3 series will be new & awesome... like you're talking to an empty room & patting yourself on the back

    guess who is employed at amd? the guy that did marketing at nvidia for a few years, why do you think fury x is called fury x?

    FLAWLESS maxwell hahahahaha.... 970 memory aside, how about all the TDR crashes in recent drivers, they even had to put out a hotfix after WHQL (are we also going to ignore kepler driver regression?)

    yes amd has to impress everyone, that is the job of marketing & the reality of depending on TSMC with its cancelled 32nm & delayed/unusable 20nm... every company needs to hype so they dont implode, all these employees have families but you're probably not thinking of them

    how the heck is near performance at cold & quiet operation a flop!? there are still 2 more air cooled fiji releases, including a 175watt one

    '4gb isnt enough', did you even look at the review? this isnt geforce FX or 2900xt, talk about a reverse fanboy...
  • chizow - Thursday, July 2, 2015 - link

    Wow awesome where were all these nuggets of wisdom and warnings of caution tempering the expectations of AMD fans in the last few months? Oh right, no where to be found! Yep, plenty with high conviction and authority insisting R9 300 won't be a rebrand, that Fiji and HBM would lead AMD to the promise land and be faster than the overpriced Nvidia offerings of 980, Titan X etc etc.

    http://www.anandtech.com/show/9239/amd-financial-a...
    http://www.anandtech.com/show/9266/amd-hbm-deep-di...
    http://www.anandtech.com/show/9383/amd-radeon-live...
    http://www.anandtech.com/show/9241/amd-announces-o...
    http://www.anandtech.com/show/9236/amd-announces-r...

    No Nobel Prize needed, the ability to gloat and say I told you so to all the AMD fanboys/apologists/supporters is plenty! Funny how none of them bothered to show up and say they were wrong, today!

    And yes the 970, they stumbled with the memory bandwidth mistake, but did it matter? No, not at all. Why? Because the error was insignificant and did not diminish its value, price or performance AT ALL. No one cared about the 3.5GB snafu outside of AMD fanboys, because 970 delivered where it mattered, in games!

    Let's completely ignore the fact 970/980 have led Nvidia to 10 months of dominance at 77.5% market share, or the fact the 970 by itself has TRIPLED the sales of AMD's entire R9 200 series on Steam! So yes, Nvidia has executed flawlessly and as a result, they have pushed AMD to the brink in the dGPU market.

    And no, 4GB isn't enough, did YOU read the review? Ryan voiced concern throughout the entire 4GB discussion, saying while it took some effort, he was able to "break" the Fuiry X and force a 4GB limit. That's only getting to be a BIGGER problem once you CF these cards and start cranking up settings. So yeah, if you are plunking down $650 on a flagship card today, why would you bother with that concern hanging over your head when for the same price, you can buy yourself 50% more headroom? Talk about reverse fanboyism, 3.5GB isn't enough on a perf midrange card, but its jolly good A-OK for a flagship card "Optimized for 4K" huh?

    And speaking of those employees and families. You don't think it isn't in their best interest, and that they aren't secretly hoping AMD folds and gets bought out or they get severance packages to find another job? LOL. Its a sinking ship, if they aren't getting laid off they're leaving for greener pastures. Everyone there is just trying to stay afloat hoping some of these rumors a company with deep pockets will come and save them from the sinking dead weight that has become of ATI/AMD.
  • D. Lister - Thursday, July 2, 2015 - link

    My concern is, the longer AMD's current situation lingers, the higher the chance that the new buyers would simply cannibalize AMD's tech and IPs and permanently put down the brand "AMD", due to the the amount of negative public opinion attached to it.
  • chizow - Monday, July 6, 2015 - link

    @D. Lister sorry missed this. I think AMD as a brand/trademark will be dead regardless. It has carried value brand connotation for some time and there was even some concern about it when AMD chose to drop the name ATI from their graphics cards a few years back. Radeon however I think will live on to whoever buys them up, as it still carries good marketplace brand recognition.
  • Intel999 - Friday, July 3, 2015 - link

    @Chizow

    Dude, what's the deal? Did an AMD logoed truck run over your dog or something.

    Seems like every article regarding AMD has you spewing out hate against them. I think we all realize Nvidia is in the lead. Why exert so much energy to put down a company that you have no intention of ever buying from?

    AMD wasn't even competing in the high end prior to the Fury X release. So any sales they get are sales that would have gone to the 980 by default. So they have improved their position. A home run? No.

    Take pleasure in knowing you are a member of the winning team. Take a chill pill and maybe the comments sections can be more informative for the rest of us.

    I, for one, would prefer to not having to skip over three long winded tirades on each page that start with Chizow.
  • chizow - Friday, July 3, 2015 - link

    @Intel999, if you want to digest your news in a vacuum, stick your head in the sand and ignore the comments section as you've already self-prescribed!

    For others, a FORUM is a place to discuss ideas, exchange points of view, provide perspective and to keep both companies and fans/proponents ACCOUNTABLE and honest. If you have a problem, maybe the internet isn't a place for you!

    Do you go around in every Nvidia or Intel thread or news article and ask yourself the same anytime AMD is mentioned or brought up? What does this tell you about your own posting tendencies???

    Again, if you, for one, would prefer to skip over my posts, feel free to do so! lol.
  • silverblue - Friday, July 3, 2015 - link

    I think you need to blame sites such as WCCFTech rather than fanboys/enthusiasts in general for the "Fury X will trounce 980 Ti/Titan X" rumours.

    Also, if the 970 memory fiasco didn't matter, why was there a spate of returns? It's obvious that the users weren't big enough NVIDIA fanboys to work around the issue... going by your logic, that is.

    The 970 isn't a mid-range card to anybody who isn't already rocking a 980 or above. 960, sure.

    Fury X is an experiment, one that could've done with more memory of course, and I usually don't buy into the idea of experiments, but at least it wasn't a 5800/Parhelia/2900 - it's still the third best card out there with little breathing space between all three (depending on game, of course), not quite what AMD promised unless they plan to fix everything with a killer driver set (unlikely). The vanilla Fury with its GDDR5 may stand to outperform it, albeit at a slightly higher power level.

Log in

Don't have an account? Sign up now