AMD 7th Gen Bristol Ridge and AM4 Analysis: Up to A12-9800, B350/A320 Chipset, OEMs first, PIBs Later

Name: AMD 7th Gen Bristol Ridge and AM4 Analysis: Up to A12-9800, B350/A320 Chipset, OEMs first, PIBs Later
Item: AMD 7th Gen Bristol Ridge and AM4 Analysis: Up to A12-9800, B350/A320 Chipset, OEMs first, PIBs Later
Author: Dr. Ian Cutress

by Ian Cutress on September 23, 2016 9:00 AM EST

122 Comments | Add A Comment

122 Comments

Over the last two weeks, AMD officially launched their 7^th Generation Bristol Ridge processors as well as the new AM4 socket and related chipsets. The launch was somewhat muted, as the target for the initial launch is purely to the big system OEMs and system integrators, such as Lenovo, HP, Dell and others – for users wanting to build their own systems, ‘Product-in-Box’ units (called PIBs) for self-build systems will come at the end of the year. We held off on the announcement because the launch and briefings left a number of questions unanswered as to the potential matrix of configurations, specifications of the hardware and how it all connects together. We got a number of answers, so let’s delve in.

The CPUs

The seven APUs and one CPU being launched for OEM systems spans from a high-frequency A12 part using the 7^th Generation microarchitecture (we call it Excavator v2) to the A6, and they all build on the Bristol Ridge notebook parts that were launched earlier in the year but focused on the desktop this time around. AMD essentially skipped the 6^th Gen, Carrizo, for desktop as the design was significantly mobile focused – we ended up with one CPU, the Athlon X4 845 (which we reviewed), with DDR3 support but no integrated graphics. Using the updated 28nm process from TSMC, AMD was able to tweak the microarchitecture and allow full on APUs for desktops using a similar design.

The full list of processors is as follows:

AMD 7th Generation Bristol Ridge Processors
	Modules/ Threads	CPU Base / Turbo (MHz)	GPU	GPU Base / Turbo (MHz)	TDP
A12-9800	2M / 4T	3800 / 4200	Radeon R7	800 / 1108	65W
A12-9800E	2M / 4T	3100 / 3800	Radeon R7	655 / 900	35W
A10-9700	2M / 4T	3500 / 3800	Radeon R7	720 / 1029	65W
A10-9700E	2M / 4T	3000 / 3500	Radeon R7	600 / 847	35W
A8-9600	2M / 4T	3100 / 3400	Radeon R7	655 / 900	65W
A6-9500	1M / 2T	3500 / 3800	Radeon R5	720 / 1029	65W
A6-9500E	1M / 2T	3000 / 3400	Radeon R5	576 / 800	35W
Athlon X4 950	2M / 4T	3500 / 3800	-	-	65W

AMD’s mainstream processors will now hit a maximum of 65W in their official thermal design power (TDP), with the launch offering a number of 65W and 35W parts. There is the potential to offer CPUs with a configurable TDP, however much like the older parts that supported 65W/45W modes, it was seldom used, and chances are we will see OEMs stick with the default design power windows here. Also, the naming scheme: any 35W part now has an ‘E’ at the end of the processor name, allowing for easier identification.

As part of this review, we were able to snag a few extra configuration specifications for each of the processors, including the number of streaming processors in each, base GPU frequencies, base Northbridge frequencies (more on the NB later), and confirmation that all the APUs launched will support DDR4-2400 at JEDEC sub-timings.

AMD 7th Generation 65W Bristol Ridge Processors
	Modules/ Threads	CPU Base / Turbo (MHz)	GPU SPs	GPU Base / Turbo (MHz)	Northbridge Base (MHz)
A12-9800	2M / 4T	3800 / 4200	512	800 / 1108	1400
A10-9700	2M / 4T	3500 / 3800	384	720 / 1029	1400
A8-9600	2M / 4T	3100 / 3400	384	655 / 900	1300
A6-9500	1M / 2T	3500 / 3800	384	720 / 1029	1400
Athlon X4 950	2M / 4T	3500 / 3800	-	-	1400

AMD 7th Generation 35W Bristol Ridge Processors
	Modules/ Threads	CPU Base / Turbo (MHz)	GPU SPs	GPU Base / Turbo (MHz)	Northbridge Base (MHz)
A12-9800E	2M / 4T	3100 / 3800	512	655 / 900	1300
A10-9700E	2M / 4T	3000 / 3500	384	600 / 847	1300
A6-9500E	1M / 2T	3000 / 3400	256	576 / 800	1300

The A12-9800 at the top of the stack is an interesting part on paper. If we do a direct comparison with the previous high-end AMD APUs, the A10-7890K, A10-7870K and A10-7860K, a lot of positives end up on the side of the A12.

High-End AMD APU Comparison
	A12-9800	A10-7890K	A10-7870K	A10-7860K	A10-9700
MSRP	-	$165	$137	$117	-
Platform	Bristol Ridge	Kaveri Refresh			Bristol Ridge
uArch	Excavator v2	Steamroller	Steamroller	Steamroller	Excavator v2
Threads	2M / 4T	2M / 4T	2M / 4T	2M / 4T	2M / 4T
CPU Base Freq	3800	4100	3900	3600	3500
CPU Turbo Freq	4200	4300	4100	4000	3800
IGP SPs	512	512	512	512	384
GPU Turbo Freq	1108	866	866	757	1029
TDP	65W	95W	95W	65W	65W
L1-I Cache	192 KB	192 KB	192 KB	192 KB	192 KB
L1-D Cache	128 KB	64 KB	64 KB	64 KB	128 KB
L2 Cache	2 MB	4 MB	4 MB	4 MB	2 MB
DDR Support	DDR4-2400	DDR3-2133	DDR3-2133	DDR3-2133	DDR4-2400
PCIe 3.0	x8	x16	x16	x16	x8
Chipsets	B350 A320 X/B/A300	A88X A78 A68H	A88X A78 A68H	A88X A78 A68H	B350 A320 X/B/A300

The frequency of the A12-9800 gives it a greater dynamic range than the A10-7870K (having 3.8-4.2 GHz, rather than 3.9-4.1), but with the newer Excavator v2 microarchitecture, improved L1 cache, AVX 2.0 support and a much higher integrated graphics frequency (1108 MHz vs. 866 MHz) while also coming in at 30W less TDP. The 30W TDP jump is the most surprising – we’re essentially getting better than the previous A10-class performance at a lower power, which is most likely why they started naming the best APU in the stack an ‘A12’. Basically, the A12-9800 APU will be an extremely interesting one to review given the smaller L2 cache but faster graphics and DDR4 memory.

A Wild Overclocker Appears!

Given that technically the systems with the new APUs have been released for a couple of weeks, some vendors have their internal enthusiasts play around with the platform. Bearing in mind that AMD has not announced any formal overclocking support on these new APUs, NAMEGT, a South Korean overclocker with ties to ASUS, has pushed the A12-9800 APU to 4.8 GHz by adjusting the multiplier. To do this, he used an unreleased ASUS Octopus AM4 motherboard and AMD’s 125W Wraith air cooler (which will presumably be bundled with PIBs later in the product cycle).

Credit: NAMEGT and HWBot

NAMEGT ran this setup on multithreaded Cinebench 11.5 and Cinebench 15, scoring 4.77 and 380 respectively for a 4.8 GHz overclock. If we compare this to our Bench database results, we see the following

Cinebench 11.5 - Multi-Threaded

For Cinebench 15, this overclocked score puts the A12-9800 above the Haswell Core i3-4360 and the older AMD FX-4350, but below the newer Skylake i3-6100TE. The Athlon X4 845 at stock frequencies scored 314 while running at 3.5 GHz, which would suggest that a stock A12-9800 at 3.8 GHz would fall around the 340 mark.

Cinebench R15 - Multi-Threaded

(Since writing this, a preview by Korean website Bodnara, using the A12-9800 in a GIGABYTE motherboard, scored 334 for a stock Cinebench 15 multithreaded test and 96 for the single threaded test. We've added this result for perspective.)

Cinebench R15 - Single Threaded

When we previously tested the Excavator architecture for desktop on the 65W Athlon X4 845, overclocking was a nightmare, with stability being a large issue. At the time, we suspected that due to the core design being focused towards 15W, moving beyond 65W was perhaps a bit of a stretch for the design at hand. This time around, as we reported before, Bristol Ridge is using an updated 28nm process over Carrizo, which may have a hand in this.

When we asked AMD about overclocking details on the new APUs, the return reply was along the lines of ‘No OEM systems at this time will be unlocked, and no official comment on the individual units. More details will be released closer to the platform launch for DIY users’.

An Unusual Launch Cycle: OEMs now, Individual Units Later

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

122 Comments

View All Comments

ddriver - Saturday, September 24, 2016 - link
Hey, at least Trump is only preposterous and stupid. Hillary is all that PLUS crazy and evil. She is just as racist as Trump, if not more so, but she is not in the habit of being honest, she'd prefer to claim the votes of minorities.

Politics is a joke and the current situation is a very good example of it. People deserve all shit that coming their way if they still put faith in the political process after this.
ClockHound - Friday, September 23, 2016 - link
+101

Particularly enjoyed the term: "walled garden spyware milking station" model

Ok, not really enjoyed, cringed at the accuracy, however. ;-)
msroadkill612 - Wednesday, April 26, 2017 - link
An adage I liked "If its free, YOU are the product."
hoohoo - Friday, September 23, 2016 - link
I see what you did there! Nicely done.
patrickjp93 - Saturday, September 24, 2016 - link
No they aren't. If Geekbench optimized for x86 the way it does for ARM, the difference in performance per clock is nearly 5x
ddriver - Saturday, September 24, 2016 - link
You have no idea what you are talking about. Geekbench is very much optimized, there are basically three types of optimization:

optimization done by the compiler - it eliminates redundant code, vertorizes loops and all that good stuff, that happens automatically

optimization by using intrinsics - do manually what the compiler does automatically, sometimes you could do better, but in general, compiler optimizations are very mature and very good at doing what they do

"optimization" of the type "if (CPUID != INTEL) doWorse()" - harmful optimization that doesn't really optimize anything in the true sense of the word, but deliberately chooses a less efficient code path to purposely harm the performance of a competitor - such optimizations are ALWAYS in the favor of the TOP DOG - be that intel or nvidia - companies who have excess of money to spend on such idiotic things. Smaller and less profitable companies like amd or arm - they don't do that kind of shit.

Finally, performance is not magic, you can't "optimize" and suddenly get 5X the performance. Process and TDP are a limiting factor, there is only so much performance you can get out of a chip produced at a given process for a given thermal budget. And that's if it is some perfectly efficient design. A 5W 20nm x86 chip could not possibly be any faster than a 5W 20nm ARM chip, intel has always had a slight edge in process, but if you manufacture an arm and a x86 chip on identical process (not just the claimed node size) with the same thermal budget the amr chip will be a tad faster, because the architecture is less bloated and more efficient.

It is a part of a dummy's belief system that arm chips are somehow fundamentally incapable of running professional software - on the contrary, hardware wise they are perfectly capable, only nobody bothers to write professional software for them.
patrickjp93 - Saturday, September 24, 2016 - link
I have a Bachelor's in computer science and specialized in high performance parallel, vectorized, and heterogeneous computing. I've disassembled Geekbench on x86 platforms, and it doesn't even use anything SSE or higher, and that's ancient Pentium III instructions.

It does not happen automatically if you don't use the right compiler flags and don't have your data aligned to allow the instructions to work.

You need intrinsics for a lot of things. Clang and GCC both have huge compiler bug forums filled with examples of where people beat the compilers significantly.

Yes you can get 5x the performance by optimizing. Geekbench only handles 1 datem at a time on Intel hardware vs. the 8 you can do with AVX and AVX2. Assuming you don't choke on bandwidth, you can get an 8x speedup.

ARM is not more efficient on merit, and x86 is not bloated by any stretch. Both use microcode now. ARM is no longer RISC by any strict definition.

Cavium has. Oracle has. Google has. Amazon has. In all cases ARM could not keep up with Avoton and Xeon D in performance/watt/$ and thus the industry stuck with Intel instead of Qualcomm or Cavium.
Toss3 - Sunday, September 25, 2016 - link
This is a great post, and I just wanted to post an article by PC World where they discussed these things in simpler terms: http://www.pcworld.com/article/3006268/tablets/tes...

As you can see the performance gains aren't really that great when it comes to real world usage, and as such we should probably start to use other benchmarks as well, and not just use Geekbench or browser javascript performance as indicators of actual performance of these SoCs especially when comparing one platform to another.
amagriva - Sunday, September 25, 2016 - link
Good post. To any interested a good paper on the subject : http://etn.se/images/expert/FD-SOI-eQuad-white-pap...
ddriver - Sunday, September 25, 2016 - link
I've been using GCC mostly, and in most of the cases after doing explicit vectorization I found no perf benefits, analyzing assembly afterwards revealed that the compiled has done a very good job at vectorizing wherever possible.

However, I am highly skeptical towards your claims, I'll believe it when I see it. I can't find the link now, but last year I've read detailed analysis, showing that A9X core performance per watt better than skylake over most of the A9X's clock range. And not in geekbench, but in SPEC.

As for geekbench, you make it sound as if they actually disabled vectorization explicitly. Which would be an odd thing. Not entirely clear what you mean by "1 datem at a time", but if you mean they are using scalar rather than vector instructions, that would be quite odd too. Luckily, I have better things to do than rummage about in geekbench machine code, so I will take your word that it is not properly optimized.

And sure, 256bit wide SIMD will have higher throughput than 128bit SIMD, but nowhere nearly 8 or even 5 times. And that doesn't make arm chips any less capable of running devices, which are more than useless toys. Those chips are more powerful than workstations were some 10 years ago, but their usability is nowhere near that. As the benchmarks from the link Toss3 posted indicate, the A9X is only some ~40% slower than i5-4300U in the "true/real world benchmarks", and that's a 15 watt chip vs the A9X is like what, 5-ish or something like that? And ARM is definitely more efficient once you account for intel's process advantage. This will become obvious if intel ever dare to manufacture arm cores at the same process as their own products. And it is not because of the ISA bloat but because of the design bloat.

Naturally, ARM chips are a low margin product, one cannot expect a 50$ chip to outperform a 300$ chip, but the gap appears to be closing, especially keeping in mind the brickwall process is going to hit the next decade. A 50$ chip running equal to a 300$ (and much wider design) chip from 2 year ago opens up a lot of possibilities, but I am not seeing any of them being realized by the industry.

AMD 7th Gen Bristol Ridge and AM4 Analysis: Up to A12-9800, B350/A320 Chipset, OEMs first, PIBs Later

The CPUs

A Wild Overclocker Appears!

Post Your Comment

122 Comments

View All Comments

ddriver - Saturday, September 24, 2016 - link

ClockHound - Friday, September 23, 2016 - link

msroadkill612 - Wednesday, April 26, 2017 - link

hoohoo - Friday, September 23, 2016 - link

patrickjp93 - Saturday, September 24, 2016 - link

ddriver - Saturday, September 24, 2016 - link

patrickjp93 - Saturday, September 24, 2016 - link

Toss3 - Sunday, September 25, 2016 - link

amagriva - Sunday, September 25, 2016 - link

ddriver - Sunday, September 25, 2016 - link

Log in

Don't have an account? Sign up now