The NVIDIA Titan V Preview - Titanomachy: War of the Titans

Name: The NVIDIA Titan V Preview - Titanomachy: War of the Titans
Item: The NVIDIA Titan V Preview - Titanomachy: War of the Titans

by Ryan Smith & Nate Oh on December 20, 2017 11:30 AM EST

111 Comments | Add A Comment

111 Comments

Meet Titan V

Having quickly covered the core architecture, let’s talk about GV100 and the Titan V itself.

Like Pascal before it, for Volta NVIDIA has decided to start big, kicking off the architecture with its flagship compute GPU design, and then letting that cascade down in the future. And NVIDIA didn’t just start big in a metaphorical sense, but in a literal sense as well. At 815mm2 GV100 is massive, even by GPU standards.

This massive GPU comes even after NVIDIA has jumped nodes to TSMC’s highly optimized 16nm FinFET descendant, the aptly named 12nm FFN(vidia) process. At 21.1 billion transistors, NVIDIA has invested all of their gains and then some back into more GPU hardware, pushing the envelope on performance like never before. This is a big part of what makes GV100 such a powerful GPU, though one can only speculate what this is doing for chip yields. Titan V is very clearly a bin for chips that don’t meet the higher requirements of the Tesla V100, and NVIDIA in turn seems to have plenty of Titan V cards available.

By the numbers, GV100 contains 84 SMs. Each SM is, in turn, contains 64 FP32 CUDA cores, 64 INT32 CUDA cores, 32 FP64 CUDA cores, 8 tensor cores, and a significant quantity of cache at various levels. Due to the aforementioned size of the GPU and yield management needs, no product ships with all 84 SMs enabled. Rather both Tesla V100 and Titan V ship with 80 SMs enabled, making for a total of 5120 FP32 CUDA cores and 640 tensor cores.

Like its compute-centric predecessor, GP100, GV100 retains a unique ratio of 64 CUDA cores per SM rather than the usual 128 per SM. This means the ratio of control hardware, cache, and register files to CUDA cores is much higher than on consumer parts. For a compute-centric GPU this makes a lot of sense, however after what NVIDIA did with Pascal and limiting this design to just GP100, it’s worth noting that it’s entirely possible that we’ll see an entirely different SM arrangement on future consumer Volta GPUs.

Also like GP100 before it, GV100’s memory of choice is HBM2. All GV100 packages ship with 4 stacks of the memory, however Titan V only features 3 of those 4 stacks enabled. As this is a salvage part, presumably we’re looking at GV100 packages where there was either a failure in an HBM2 stack, or in the associated memory controller on the GV100 die itself. Either way, this means that Titan V ships with 12GB of VRAM clocked at 1.7Gbps/pin, leading to a total of 653GB/sec of memory bandwidth.

Along with the workstation-suitable card design and inability to use the Tesla driver stack, the memory difference is one of the key differentiators between the Titan V and the PCIe version of the Tesla V100. Otherwise, NVIDIA has confirmed that the Titan V gets the GV100 GPU’s full, unrestricted FP64 compute, FP16 compute, and tensor core performance. To the best of our knowledge (and from what NVIDIA will comment on) it doesn’t appear that they’ve artificially disabled any of the GPU’s core features. So for most use cases, the Titan V is extremely close to the Tesla V100.

In terms of clockspeeds, the HBM2 has been clocked at 1.7GHz, while the 1455MHz boost clock actually matches the 300W SXM2 variant of the Tesla V100, though that accelerator is passively cooled. Notably, the number of tensor cores have not been touched, though the official 110 DL TFLOPS rating is lower than the 1370MHz PCIe Tesla V100, as it would appear that NVIDIA is using a clockspeed lower than their boost clock in these calculations.

For the card itself, it features a vapor chamber cooler with copper heatsink and 16 power phases, all for the 250W TDP that has become standard with the single-GPU Titan models. Output-wise, the Titan V brings 3 DisplayPorts and 1 HDMI connector. And as for card-to-card communication, there is no SLI or NVLink support for the Titan. The PCB itself has NVLink connections on the top, but these have been intentionally blocked by the shroud to prevent their use and are disabled.

Looking at overall performance expectations then, the Titan V is clearly the fastest of the Titans. And yet outside of compute, the advantage for graphics is much smaller. Relative to the Titan Xp we’re looking at just a 14% on-paper advantage in FP32 shader throughput, and thanks to the slightly lower clockspeed an actual ROP throughput disadvantage. The real-world impact of these differences will play out differently among different programs and games, as we’ll see. But it’s an important piece of context all the same. GV100 has a lot of hardware that really only helps compute performance, and from a power standpoint that hardware is a liability. This is why NVIDIA creates differentiated consumer and compute-focused GPUs, and why GV100 isn’t quite as potent for gaming as it may seem.

Gallery: NVIDIA Titan V

A Note on Graphics Features

Before diving into our benchmarks, we also wanted to take a quick look at the graphics features of the Titan V. As this is the first Volta card with display outputs, this is our first chance to see if Volta has any new graphics capabilities. NVIDIA for their part has not been discussing Volta’s graphics features in-depth, even with the launch of Titan V, since the focus is on compute.

The flip side to this however is that everything here should still be taken with a grain of salt. Not because it’s inaccurate for Titan V, but because it’s only accurate for GV100 on the current driver stack. This is not a graphics-focused product, and that means there’s no guarantee NVIDIA has every new/upgraded feature exposed. Or for that matter, whether future consumer chips will have identical graphics features.

NVIDIA GPU DirectX Graphics Feature Info
	Volta (Titan V)	Pascal (Titan Xp)
Direct3D Feature Level	12_1	12_1
Fast FP16 Shaders	No	No
Tiled Resources	Tier 3	Tier 3
Resource Binding	Tier 3	Tier 3
Conservative Rasterization	*Tier 3*	Tier 2
Resource Heap	Tier 1	Tier 1

All of that said, what we find is that indeed, according to NVIDIA’s drivers the graphics capabilities of the Titan V are almost identical to that of the Pascal-based Titan Xp. The latter was already a fairly advanced for its time DirectX feature level 12_1 card, which is still the highest overall feature level tier within DirectX. So any differentiation is limited to the individual features. Which in this case is that the Titan V supports conservative rasterization tier 3 rather than Titan Xp’s more limited tier 2. Outside of software developers this doesn’t mean much at the moment, but it does mean that Volta is the inflection point for when developers can treat conservative rasterization tier 3 as a GPU baseline feature here in half a decade or so.

Meanwhile, as GP100 never came to a card using the GeForce driver set – the closest it got was the Quadro GP100 – this is also our first look at an NVIDIA graphics card with fast FP16 support. A lot has been made of FP16 support in recent years for pixel shaders, as the reduced precision allows for greater shader efficiency and total throughput. The Playstation 4 Pro supports FP16 shaders, as do AMD’s Vega architecture cards.

But for the Titan V, while it has fast FP16 support in hardware, as it turns out this feature hasn’t been exposed to any APIs outside of CUDA. In both Direct3D and OpenGL, FP16 is not exposed and is promoted to FP32 instead. At this point I don’t know of any reason why it needs to be this way – NVIDIA should be able to expose fast FP16 to Direct3D – but for the moment this is not the case. This may be an early driver thing, or if NVIDIA goes the same route with consumer Volta cards as they did Pascal cards, then those cards may not even support fast FP16. In which case there’s little point in enabling fast FP16 support for pixel shaders on the Titan V.

The Test

For gaming, we've opted for 4K-only for this preview, running a subset of our games. Since this is the first Volta card we are benching, we tested both DX11 and DX12 modes for Deus Ex: Mankind Divided and Total War: Warhammer on the Titan V. Load power consumption was measured on Battlefield 1 DX11 on 1440p for the sake of consistency with past results, while average clockspeeds of games were taken at 4K.

And as for our surprise entry at the end, we utilized the venerable Framebuffer Warhead tool, using the 'frost' benchmark with the 64-bit executable. SSAA was enabled in NVIDIA drivers outside the game.

For our preview of the NVIDIA Titan V, we are using NVIDIA’s 388.59 launch driver for all of our Titan cards. Meanwhile, unless explicitly running a FP64 workload, the original GTX Titan was benchmarked with full speed FP64 disabled, as is default for this card.

CPU:	Intel Core i7-7820X @ 4.3GHz
Motherboard:	Gigabyte X299 AORUS Gaming 7 (BIOS version F7)
Power Supply:	Corsair AX860i
Hard Disk:	OCZ Toshiba RD400 (1TB)
Memory:	G.Skill TridentZ DDR4-3200 4 x 8GB (16-18-18-38)
Case:	NZXT Phantom 630 Windowed Edition
Monitor:	LG 27UD68P-B
Video Cards:	*NVIDIA Titan V* NVIDIA Titan Xp NVIDIA GeForce GTX Titan X (Maxwell) NVIDIA GeForce GTX Titan
Video Drivers:	NVIDIA Release 388.59
OS:	Windows 10 Pro (Creators Update)

The Volta Architecture: In Brief Compute Performance: GEMM & SiSoft Sandra

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

mode_13h - Wednesday, December 27, 2017 - link
I don't know if you've heard of OpenCL, but there's not reason why a GPU needs to be programmed in a proprietary language.

It's true that OpenCL has some minor issues with performance portability, but the main problem is Nvidia's stubborn refusal to support anything past version 1.2.

Anyway, lots of businesses know about vendor lock-in and would rather avoid it, so it sounds like you have some growing up to do if you don't understand that.
CiccioB - Monday, January 1, 2018 - link
Grow up.
I repeat. None is wasting millions in using not certified, supported libraries. Let's avoid talking about entire frameworks.
If you think that researches with budgets of millions are nerds working in a garage with avoiding lock-in strategies as their first thought in the morning, well, grow up kid.
Nvidia provides the resources to allow them to exploit their expensive HW at the most of its potential reducing time and other associated costs. Also when upgrading the HW with a better one. That's what counts when investing millions for a job.
For you kid's home made AI joke, you can use whatever alpha library with zero support and certification. Others have already grown up.
mode_13h - Friday, January 5, 2018 - link
No kid here. I've shipped deep-learning based products to paying customers for a major corporation.

I've no doubt you're some sort of Nvidia shill. Employee? Maybe you bought a bunch of their stock? Certainly sounds like you've drunk their kool aid.

Your line of reasoning reminds me of how people used to say businesses would never adopt Linux. Now, it overwhelmingly dominates cloud, embedded, and underpins the Android OS running on most of the world's handsets. Not to mention it's what most "researchers with budgets of millions" use.
tuxRoller - Wednesday, December 20, 2017 - link
"The integer units have now graduated their own set of dedicates cores within the GPU design, meaning that they can be used alongside the FP32 cores much more freely."

Yay! Nvidia caught up to gcn 1.0!
Seriously, this goes to show how good the gcn arch was. It was probably too ambitious for its time as those old gpus have aged really well it took a long time for games to catch up.
CiccioB - Thursday, December 21, 2017 - link
<blockquote>Nvidia caught up to gcn 1.0!</blockquote>
Yeah! It is known to the entire universe that it is nvidia that trails AMD performances.
Luckly they managed to get this Volta out in time before the bankruptcy.
tuxRoller - Wednesday, December 27, 2017 - link
I'm speaking about architecture not performance.
CiccioB - Monday, January 1, 2018 - link
New bigger costier architectures with lower performance = fail
tuxRoller - Monday, January 1, 2018 - link
Ah, troll.
CiccioB - Wednesday, December 20, 2017 - link
Useless card
Vega = #poorvolta
StrangerGuy - Thursday, December 21, 2017 - link
AMD can pay me half their marketing budget and I will still do better than them...by doing exactly nothing. Their marketing is worse than being in a state of non-existence.

The NVIDIA Titan V Preview - Titanomachy: War of the Titans

Meet Titan V

A Note on Graphics Features

The Test

Post Your Comment

111 Comments

View All Comments

mode_13h - Wednesday, December 27, 2017 - link

CiccioB - Monday, January 1, 2018 - link

mode_13h - Friday, January 5, 2018 - link

tuxRoller - Wednesday, December 20, 2017 - link

CiccioB - Thursday, December 21, 2017 - link

tuxRoller - Wednesday, December 27, 2017 - link

CiccioB - Monday, January 1, 2018 - link

tuxRoller - Monday, January 1, 2018 - link

CiccioB - Wednesday, December 20, 2017 - link

StrangerGuy - Thursday, December 21, 2017 - link

Log in

Don't have an account? Sign up now