NVIDIA Ships First Volta-based DGX Systems
by Nate Oh on September 7, 2017 10:00 AM EST- Posted in
- GPUs
- Tesla
- NVIDIA
- Volta
- Machine Learning
- GV100
- Deep Learning
This Wednesday, NVIDIA has announced that they have shipped their first commercial Volta-based DGX-1 system to the MGH & BWH Center for Clinical Data Science (CCDS), a Massachusetts-based research group focusing on AI and machine learning applications in healthcare. In a sense, this serves as a generational upgrade as CCDS was one of the first research institutions to receive a Pascal-based first generation DGX-1 last December. In addition, NVIDIA is shipping a DGX Station to CCDS later this month.
At CCDS, these AI supercomputers will continue to be used in training deep neural networks for the purpose of evaluating medical images and scans, using Massachusetts General Hospital’s collection of phenotypic, genetics, and imaging data. In turn, this can assist doctors and medical practitioners in making faster and more accurate diagnoses and treatment plans.
First announced at GTC 2017, the DGX-1V server is powered by 8 Tesla V100s and priced at $149,000. The original iteration of the DGX-1 was priced at $129,000 with a 2P 16-core Haswell-EP configuration, but has since been updated to the same 20-core Broadwell-EP CPUs found in the DGX-1V, allowing for easy P100 to V100 drop-in upgrades. As for the DGX Station, this was also unveiled at GTC 2017, and is essentially a full tower workstation 1P version of the DGX-1 with 4 Tesla V100s. This water cooled DGX Station is priced at $69,000.
Selected NVIDIA DGX Systems Specifications | ||||||
DGX-1 (Volta) |
DGX-1 (Pascal) |
DGX-1 (Pascal, Original) |
DGX Station | |||
GPU Configuration | 8x Tesla V100 | 8x Tesla P100 | 4x Tesla V100 | |||
GPU FP16 Compute | General Purpose | 240 TFLOPS | 170 TFLOPS | 120 TFLOPS |
||
Deep Learning | 960 TFLOPS | 480 TFLOPS | ||||
CPU Configuration | 2x Intel Xeon E5-2698 v4 (20-core, Broadwell-EP) |
2x Intel Xeon E5-2698 v3 (16 core, Haswell-EP) |
1x Intel Xeon E5-2698 v4 (20-core, Broadwell-EP) |
|||
System Memory | 512 GB DDR4-2133 (LRDIMM) |
256 GB DDR4 (LRDIMM) |
||||
Total GPU Memory | 128 GB HBM2 (8x 16GB) |
64 GB HBM2 (4x 16GB) |
||||
Storage | 4x 1.92 TB SSD RAID 0 | OS: 1x 1.92 TB SSD Data: 3x 1.92 TB SSD RAID 0 |
||||
Networking | Dual 10GbE 4 InfiniBand EDR |
Dual 10Gb LAN | ||||
Max Power | 3200W | 1500W | ||||
Dimensions | 866mm x 444mm x 131mm (3U Rackmount) |
518mm x 256mm x 639mm (Tower) |
||||
Other Features | Ubuntu Linux Host OS DGX Software Stack (see Datasheet) |
Ubuntu Desktop Linux OS DGX Software Stack (see Datasheet) 3x DisplayPort |
||||
Price | $149,000 | Varies | $129,000 | $69,000 |
Taking a step back, this is a continuation of NVIDIA’s rollout of Volta-based professional/server products, with DGX Volta meeting its Q3 launch date, and OEM Volta targeted at Q4. In the past months, the first Tesla V100 GPU accelerators were given out to researchers at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR) in July, while a PCIe version of the Tesla V100 was formally announced during ISC 2017 in June.
Source: NVIDIA
48 Comments
View All Comments
etobler - Friday, September 8, 2017 - link
Thank you, but as far as I know, Robot Structural Analysis Professional doesn't have mesh manipulation tools.johnnycanadian - Friday, September 8, 2017 - link
ReMake and ReCap. They work brilliantly with comparatively large data sets and are surprisingly simple to learn and use once you get the hang of the Autodesk implementation of mesh interpolation.And as for visuals? Stunning. It may not seem important, but when an Engineering department is begging for funding, a little "ooh" and "ah" goes a long, long way with folks who don't necessarily understand what is being done but they like pretty pictures. :-)
notashill - Thursday, September 7, 2017 - link
AMD is really stuck between a rock and a hard place for the GPU market. Their architecture is way more competitive in compute than gaming but most of the compute stuff is CUDA so they're buying NVIDIA no matter what.Yojimbo - Thursday, September 7, 2017 - link
"That's pretty pathetic. Full pre-built systems with AMD Instinct hardware will give you 3 PFlops for the same price. "Link?
Yojimbo - Thursday, September 7, 2017 - link
I say "link" meaning link to us this magical system that doesn't exist. Just suppose for the moment the MI25 was released and available for purchase in shipping systems (it isn't). One MI25 has 24.6 TFLOPS of FP16. To get 3 PFLOPS you'd need 122 of them. Put 8 of them in a node. So you need 16 nodes. Now if you can find us a "full pre-built" cluster of 16 nodes containing at least 122 MI25s and all the processors, power supplies, cooling, and interconnect necessary for less than $150K I'm gonna be impressed. Such a thing would be in the several millions of dollars, once the MI25 becomes available from server makers.jordanclock - Thursday, September 7, 2017 - link
You mean the AMD Instinct MI25 that isn't even available for purchase? I mean, since there isn't a listed price, I suppose you could pretend the MI25 cost less than 1% of a DGX-1 and get a 3PFlops system.Shadow7037932 - Monday, September 11, 2017 - link
Oh look, a product that dosen't even existDrumsticks - Thursday, September 7, 2017 - link
Volta is really shaping up to be interesting. I've got a lower end Pascal GPU to get me through the year, but I'm really going to step it up when GV104 lands.I think a distinction of some kind should be made regarding the Server's FP16 performance. DGX-1V has only 240 TF of general purpose FP16 compute, right? It's deep learning workloads only where that shoots up 4x to 960. It seems like there's be a useful distinction between deep learning FLOPS and general purpose ones, unless I'm mistaken.
Nate Oh - Thursday, September 7, 2017 - link
You are absolutely correct, and the chart has been updated to reflect that. I did note specialized tensor performance separately when PCIe V100 was announced, but the NVIDIA DGX product pages and datasheet charts simply use 'FP16 TFLOPS' in comparing Volta-based DGX vs Pascal. While DGX systems are focused on deep learning, at a glance that label makes it appear that 960 TFLOPS is general FP16 performance, and so that deep learning compute clarification should definitely be made clear.Nate Oh - Thursday, September 7, 2017 - link
Oops, I forgot to say thanks, but thanks Drumsticks :)