While a good deal of NVIDIA’s success in servers over the last decade has of course come from their proficient GPUs, as a business NVIDIA these days is much more than a fabless GPU designer. With more software engineers than hardware engineers on staff, it’s software and ecosystem plays that have really cemented NVIDIA’s position as the top GPU manufacturer, and created a larger market for their GPUs. At the same time, it’s these ecosystem plays that have allowed NVIDIA to build a profit-printing machine, diversifying beyond just GPU sales and moving into systems, software, support, and other avenues.

To that end, NVIDIA this morning is formally rolling out a new ecosystem play aimed at high-end deep learning servers, which the company is branding as NVIDIA-Certified Systems. Soft-launched back in the fall, today the company is giving the program a more proper introduction, detailing the program and announcing some of the partners. Under NVIDIA’s plan, going forward customers can opt to buy NVIDIA-Certified systems if they want an extra guarantee on system performance and reliability, as well as opt in to buying support contracts to get access to direct, full-stack technical support from NVIDIA.

Conceptually, the certification program is rather straightforward, due in large part to its hardware requirements. Systems first need to be using NVIDIA’s A100 accelerators, along with Mellanox Ethernet adapters and DPUs. Or in other words, the servers already need to be using NVIDIA silicon where available. OEMs can then submit systems meeting these hardware requirements to NVIDIA, who will test the systems across multiple metrics, including multi-GPU and multi-node DL performance, network performance, storage performance, and security (secure boot/root of trust). Systems that pass these tests can then be labeled as NVIDIA-Certified.

Those certified systems, in turn, are eligible for additional full-stack technical support through NVIDIA and the OEM. Customers can opt to buy multi-year support contracts, which entitles them to support through the OEM and NVIDIA. NVIDIA essentially assumes responsibility for all software support above the OS, including their hardware drivers, CUDA, their wide collection of frameworks and libraries, and even major open source libraries like TensorFlow. The latter is what makes NVIDIA’s support proposition particularly valuable, as they’re essentially committing to helping customers with any kind of GPU or deep learning-related software issue.

Of course, that support won’t come for free: this is where NVIDIA will be making their money. While NVIDIA is not charging OEMs for certification (so there’s no additional certification tax baked into the hardware), support contracts are priced based on the number of GPUs. In one example, NVIDIA has stated that a 3 year support contract for a dual-A100 system would be $4,299, or about $715 per-year per-GPU for support. So one can imagine how quickly this ratchets up for larger 4 and 8 way A100 systems, and then again for multiple nodes.

For NVIDIA and its OEM partners, the creation of a certification program is a straightforward way to try to further grow the market for deep learning servers, especially for mid-sized businesses. The market for AI hardware has been booming, and NVIDIA wants to keep it that way by making it easier for potential customers to use their wares. NVIDIA already has the top-end of the market covered in this respect with their direct relationships with the hyperscalers – and by extension their small-cap cloud computing customers – so a hardware certification program fills the middle tier for organizations that are going to run their own servers, but aren’t going to be a massive customer that gets personalized attention.

As for those customers, NVIDIA’s server certification and support programs are designed to eliminate (or at least mitigate) the risks of making significant investments into NVIDIA hardware. That means being able to buy a system where the vendor (in this case the duo of NVIDIA and the OEM) can vouch for the performance of the system, as well as guarantee it will be able to properly run various AI packages, such as NVIDIA’s NGC catalog of GPU-optimized and containerized software.

Altogether, NVIDIA is launching with 14 certified systems, with the promise of more certified systems to come. For the first wave of systems, participating OEMs include Dell, Gigabyte, HPE, Inspur, and Supermicro, all of whom are frequently participants in new NVIDIA server initiatives.

With all that said, NVIDIA’s server certification program is unlikely to significantly change how things work for most of the company’s customers; but it’s a program that seems primed to address a specific niche for NVIDIA and its OEM partners. For companies that are interested in GPU computing but are looking for a greater degree of support and certainty, this would address those needs. Which, to bring things full circle, it’s exactly by addressing those sorts of needs with ecosystem plays like server certification that NVIDIA has been so successful in the server GPU market over the last decade.

Source: NVIDIA

POST A COMMENT

6 Comments

View All Comments

  • abufrejoval - Tuesday, January 26, 2021 - link

    Hmm, from what I understand, traditionally 3rd party vendors had to pay the OEMs to have their hardware certified, e.g. Broadcom would have to pay HP.

    Since Nvidia needs to pay for the ARM aquisition and their new position on the planet, I'd hazard it's now the OEMs that have to pay up to Nvidia so they are permitted a slice of the ML cake.

    Either way the buck will be passed on to the end user eventually.
    Reply
  • Yojimbo - Tuesday, January 26, 2021 - link

    NVIDIA isn't having its hardware certified. NVIDIA is certifying that the hardware works with its software libraries the way it is intended to. People looking to buy these certified systems are people who are presumably looking to use the NVIDIA software stack.

    And yes, of course the buck will be passed on to the end user. It is a service the end user is looking to pay for. The certification is going to be advertised as if it adds value to the server. Whether it does or not, well, that's what IT divisions are paid to figure out, isn't it? NVIDIA is not demanding that all servers be certified. It's optional. NVIDIA is offering a certification program so that the users who want to make sure that things are working correctly can have the certification. And if they buy certified server they can sign up with direct software support from NVIDIA.

    NVIDIA doesn't need more money to pay for the ARM acquisition. It already has the money.

    Anytime there's something with NVIDIA on any site there's going to be people in the comments looking to spin a tale of how NVIDIA is fleecing the world with whatever new is going on. It's tiresome.
    Reply
  • nadim.kahwaji - Wednesday, January 27, 2021 - link

    Too bad , is anandtech now only report news concerning GPUs??? Just a small comparison between CPU reviews and everything else including GPUs and you realize the unexplainable huge gap , hope that CPU reviews quality will be the norm for all categories including GPUs !!!!! Reply
  • Yojimbo - Thursday, January 28, 2021 - link

    There have been at least 7 CPU reviews/tests since the last GPU review/test from what I have seen, and that's only going back to October. I haven't checked to see when the actual last GPU review or test was. (I'm not counting SoC reviews that have both CPU and GPU elements). That despite there being multiple major GPU launches over the past 4 months. If you go to the GPUs tab and click on the 5th page of news from the site you get back to April 27, 2020. If you go to the CPUs tab and click on the 5th page of news you get to July 20, 2020. If you click on the 10th page for each, for GPUs you get to July 16, 2019 and for CPUs you get to December 13, 2019. I'm really not sure what you're talking about. Reply
  • GreenReaper - Sunday, January 31, 2021 - link

    NVIDIA has wares... if you have coin. And conveniently, they're the only ones who can truly deal with issues in their proprietary drivers. Reply
  • Samlucaslincon - Friday, February 5, 2021 - link

    Amazing blog and evrything is explained in an well manared way. https://rivipedia.com/ Reply

Log in

Don't have an account? Sign up now