Your browser does not support JavaScript!

Global Advancements in AI and High-Performance Computing

GOOVER DAILY REPORT 6/3/2024
goover

TABLE OF CONTENTS

  1. Introduction
  2. AMD and Microsoft Collaboration
  3. NVIDIA's Expansion in AI Hardware
  4. NVIDIA's GTC 2024 Announcements
  5. Enhanced Memory Grace Hopper Superchip
  6. Recent AI Developments and Industry News
  7. AMD Ryzen Threadripper 7980X Review
  8. Glossary
  9. Conclusion
  10. Source Documents

1. Introduction

  • This report outlines the latest advancements in AI technologies, accelerated computing hardware, and their applications in various industries as per collected documents.

2. AMD and Microsoft Collaboration

  • 2-1. Introduction of AMD Instinct MI300X Accelerator

  • The AMD Instinct MI300X Accelerator plays a crucial role in enhancing computing capabilities for AI. High-bandwidth memory (HBM), specialized data formats, and superior computing power are essential for advancing AI models, which are becoming increasingly complex and accurate. At Microsoft's annual developer conference 'Microsoft Build 2024', AMD showcased the MI300X Accelerator along with other computing and software functionalities, aimed at supporting a wide range of markets.

  • 2-2. Integration with Azure ND MI300X V5 Virtual Machines

  • Microsoft has officially launched the new Azure ND MI300X V5 Virtual Machines (VMs), designed to handle demanding AI workloads. These VMs are equipped with the AMD Instinct MI300X Accelerator and ROCm open software, making them suitable for complex AI applications. Customers, such as Hugging Face, which ported its models to these VMs in just one month, benefit from high performance and cost-efficiency. The Azure ND MI300X V5 VMs are now available in the Central Canada region and support GPT-3.5 and GPT-4 models for Azure OpenAI services.

  • 2-3. Performance Capabilities in AI Workloads

  • AMD Instinct MI300X and ROCm software stack support some of the most demanding AI workloads globally, including Azure OpenAI chat services like GPT-3.5 and GPT-4. These virtual machines provide exceptional HBM capacity and memory bandwidth, enabling users to load larger models onto the GPU or use fewer GPUs, thereby reducing power consumption, costs, and implementation time. This yields optimal price-to-performance ratios, especially significant for customers like Hugging Face who have successfully achieved high performance and efficiency using these VMs.

  • 2-4. Adoption by Hugging Face and Its Advantages

  • Hugging Face has been one of the early adopters of AMD Instinct MI300X VMs, and within a month, they ported their models to these VMs, achieving remarkable performance and cost efficiency. This partnership allows Hugging Face users to execute their models on AMD Instinct GPUs directly from the Azure Hugging Face Hub without code modifications. This integration provides an easier and more efficient way to generate and deploy NLP applications on the virtual machines. Hugging Face's Chief Evangelist Officer, Julien Simon, highlighted the close collaboration between Microsoft, AMD, and Hugging Face, making it simpler for Azure customers to implement AI with open models and open source.

3. NVIDIA's Expansion in AI Hardware

  • 3-1. Introduction of NVIDIA H100 GPUs

  • NVIDIA announced the introduction of the H100 Tensor Core GPUs, which are based on the NVIDIA Hopper GPU computing architecture. This new GPU offers significant advancements in developing, training, and deploying generative AI, large language models (LLMs), and recommender systems. Notably, the H100 provides 9x faster AI training and up to 30x faster AI inference on LLMs compared to the former generation A100.

  • 3-2. Deployment by Major Cloud Providers (OCI, AWS, Azure)

  • Oracle Cloud Infrastructure (OCI) has announced the limited availability of new OCI Compute bare-metal GPU instances featuring H100 GPUs. Amazon Web Services (AWS) is planning to introduce EC2 UltraClusters of Amazon EC2 P5 instances, scaling to 20,000 interconnected H100 GPUs. Microsoft Azure has launched a private preview of their H100 virtual machine, ND H100 v5, to support the growing demand for generative AI training and inference.

  • 3-3. Meta's Utilization of H100 in Grand Teton AI Supercomputer

  • Meta has deployed the H100-powered Grand Teton AI supercomputer for its AI production and research teams. This system offers significant performance enhancements, including 4x the host-to-GPU bandwidth, 2x the compute and data network bandwidth, and 2x the power envelope over its predecessor, Zion. The Grand Teton AI supercomputer is being used for training and production inference of deep learning models and content understanding.

  • 3-4. Case Studies: OpenAI, Stability AI, Twelve Labs, Anlatan

  • Several pioneers in generative AI are leveraging the H100 for their significant projects. OpenAI will use H100 on its Azure supercomputer, continuing its AI research. Stability AI, an early access customer on AWS, plans to use H100 to accelerate its video, 3D, and multimodal models. Twelve Labs intends to use H100 instances on an OCI Supercluster for enhanced video search capabilities. Anlatan is utilizing H100 on CoreWeave’s cloud platform for its AI-assisted story writing and text-to-image synthesis app, NovelAI.

4. NVIDIA's GTC 2024 Announcements

  • 4-1. Launch of Blackwell GPU Series

  • NVIDIA unveiled its new Blackwell GPU series, which is the successor to the H100 and H200 GPUs. The Blackwell GPU is considered the most powerful chip for AI workloads to date. A notable product introduced was the GB200 Superchip, created by combining two Blackwell GPUs with NVIDIA’s Grace CPU. This setup claims a 30x performance increase for large language model (LLM) inference workloads and boasts up to 25x greater power efficiency compared to the H100 GPU.

  • 4-2. Details on DGX GB200 Superchips and Systems

  • NVIDIA introduced the DGX GB200 system, which features 36 NVIDIA GB200 Superchips, comprising 36 NVIDIA Grace CPUs and 72 Blackwell GPUs connected via fifth-generation NVIDIA NVLink. The system offers a 30x performance increase over the NVIDIA H100 Tensor Core GPU for LLM inference workloads. Additionally, the DGX SuperPOD, powered by Grace Blackwell, includes eight or more DGX GB200 systems, scaling to tens of thousands of GB200 Superchips connected via NVIDIA Quantum InfiniBand. This configuration can connect 576 Blackwell GPUs within eight DGX GB200 systems.

  • 4-3. Enhanced Enterprise AI Software and Applications

  • NVIDIA has extended its enterprise AI software capabilities by natively optimizing tools for generative AI scenarios. New offerings include NVIDIA NIM, a set of optimized cloud-native microservices for genAI model deployment. In transportation, companies like BYD, Hyper, and XPENG are adopting NVIDIA DRIVE Thor™ for next-gen fleets. In healthcare, NVIDIA announced over two dozen new microservices for advanced imaging, natural language and speech recognition, and digital biology.

  • 4-4. Strategic Partnerships with Hyperscalers

  • NVIDIA expanded its partnerships with major hyperscalers, including AWS, Google Cloud, and Oracle Cloud, excluding Alibaba Cloud. For example, Amazon SageMaker will integrate with NVIDIA NIM to improve the price performance of foundation models on GPUs. The partnerships aim to facilitate digital transformation in various domains like healthcare, industrial design, and digital sovereignty.

5. Enhanced Memory Grace Hopper Superchip

  • 5-1. High Bandwidth Memory (HBM) Developments

  • The new high bandwidth memory version is available exclusively with NVIDIA's CPU-GPU Superchip. Fast memory (HBM) capacity emerged as a critical driver for the costs of Large Language Model (LLM) inference processing. NVIDIA aims to increase its market share by upgrading the memory for the Grace Hopper superchip first, boosting adoption at the expense of x86 systems.

  • 5-2. Impact on Large Language Model (LLM) Inference

  • Inference processing for large models, such as GPT3/4 and ChatGPT, demands high memory bandwidth and capacity. NVIDIA's new GH200 Superchip and dual GH200 system enhance this by adding 70% more memory capacity and 50% more bandwidth per GPU. This advancement is intended to lower the massive cost of deploying large models while enhancing performance.

  • 5-3. Dual GH200 System Specifications and Benefits

  • The dual GH200 board, expected to be available in Q2 2024, combines two Grace Hopper superchips connected by NVLink on a single board. It includes fast LPDDR5 memory, which is lower cost and consumes less energy than an x86 server. This configuration can scale to 256 GPUs over NVLink, reducing the number of GPUs needed for inference by approximately 60%. The dual MGX configuration comprises a single server with 144 Arm Neoverse cores, eight petaflops of AI performance, and 282GB of HBM3e memory, delivering up to 3.5x more memory capacity and 3x more bandwidth than current-generation offerings.

  • 5-4. Market Implications and Competitive Edge

  • NVIDIA’s new products are expected to significantly impact the market by driving more adoption of the Grace CPU and reducing dependence on x86 systems. This development addresses a major industry challenge, closing a competitive gap with AMD and increasing demand for NVIDIA’s AI hardware. Customers can save on capital and operational expenditures by using fewer GPUs and taking advantage of the Grace CPU's efficiency and performance improvements.

6. Recent AI Developments and Industry News

  • 6-1. Latest Innovations in AI-Powered Robots

  • A collaboration between Nvidia and UPenn has released DrEureka, an open-source model using an LLM agent to write code for training robots in simulations and real-world deployment. Tesla also updated their progress on the Optimus robot, showcasing robots being trained on various tasks via human teleoperation. The neural net used for Optimus is run end-to-end, from camera and sensor data input to joint control sequences output. Two units are already in early testing at real factory workstations.

  • 6-2. Deepseek v2: New Mixture of Experts Model

  • Deepseek, a Chinese company, introduced Deepseek v2, a 236bn parameter open-source Mixture of Experts model trained on 8.1 trillion tokens. It features 21bn activated parameters, scores 77.8 on the MMLU benchmark, and 81.1 on HumanEval. Offered via API at $0.14 per million-token input and $0.28 per million-token output, Deepseek v2 benefits from advances in multi-head latent attention and a novel sparse architecture, contributing to reduced training costs and efficient inference.

  • 6-3. Speculations on OpenAI’s Smaller Model Tests

  • There has been speculation about new models potentially being tested by OpenAI on chat.lmsys.org under names like 'gpt2-chatbot.' This was supported by cryptic tweets from Sam Altman. The original GPT-2 had 1.5bn parameters, which might suggest upcoming smaller models compared to GPT-4 or GPT-3. These smaller models are likely used to test new ideas and predict performance at higher scales using scaling laws.

  • 6-4. Significant AI Industry Updates Including GitHub Copilot and Amazon Q

  • Key industry updates include GitHub's launch of Copilot Workspace, an integrated developer environment for coding through natural language commands. Amazon introduced Amazon Q, an AI-powered assistant for AWS users that learns from company data and workflows to answer business-related questions. Another notable development is the appearance of 'gpt2-chatbot' on the LMSYS leaderboard, speculated to be an unofficial test by OpenAI to benchmark a new model.

7. AMD Ryzen Threadripper 7980X Review

  • 7-1. Performance Evaluation in High-Load Tasks

  • The AMD Ryzen Threadripper 7980X processor excels in high-load tasks, making it a preferred choice for professionals who deal with high-resolution video and 3D data processing. The processor shows remarkable efficiency when handling large amounts of data required in AI, data science, and big data environments. According to various workload tests, the processor performs significantly better compared to standard desktop CPUs, highlighting its superior multi-core performance.

  • 7-2. Technical Specifications and System Setup

  • The test system for evaluating the Ryzen Threadripper 7980X included an ASUS Pro WS TRX50-SAGE WIFI motherboard, G.SKILL Zeta R5 Neo DDR5-6400 ECC 32GB (x4) modules, and a Radeon RX 7900 XTX graphics card. This setup provided a robust platform to fully leverage the processor's capabilities. The Threadripper 7980X consists of up to 64 cores and 128 threads, built on AMD’s Zen4 architecture. It supports the new TRX50 platform with PCI-E 5.0 and 8-channel memory support.

  • 7-3. Benchmarks: Cinebench, Corona, Handbrake, UL Procyon

  • In Cinebench 2024, the Ryzen Threadripper 7980X scored an impressive 5380 points in the multi-core test, outperforming any other CPU in the market. In the Corona 10 rendering benchmark, the processor achieved 32,296,312 points, demonstrating the efficiency of its multiple cores. When upscaling a 2-minute FHD video to 4K using Handbrake, the processor completed the task in approximately 2 minutes and 40 seconds, significantly faster than usual processors. The UL Procyon AI inference benchmark with FP16 settings showed excellent performance, even surpassing mobile versions of the GeForce RTX 4090.

  • 7-4. Market Position and Value Proposition

  • The Ryzen Threadripper 7980X continues to be a top choice for experts needing high computational power. Despite its high cost, around 7,000,000 KRW, compared to the Ryzen 9 7950X, the value proposition lies in its ability to save substantial time on high-demand tasks. The Ryzen Threadripper 7980X remains unrivaled for professionals in rendering, video production, and AI computation, maintaining its status as a processor built for specialists.

8. Glossary

  • 8-1. AMD Instinct MI300X [AI Accelerator]

  • The AMD Instinct MI300X accelerator is designed to enhance AI workloads by offering high bandwidth memory and specialized computing capabilities, playing a crucial role in AI model training and inference.

  • 8-2. NVIDIA H100 GPU [GPU]

  • The NVIDIA H100, based on the Hopper architecture, provides unparalleled performance for AI training and inference, significantly advancing generative AI and large language models (LLMs).

  • 8-3. Azure ND MI300X v5 [Virtual Machine]

  • Microsoft Azure’s ND MI300X v5 VM leverages the AMD Instinct MI300X accelerator to offer high-performance computing solutions for demanding AI workloads, enhancing efficiency and scalability.

  • 8-4. GRACE Hopper Superchip [CPU/GPU Combination]

  • The Grace Hopper Superchip combines NVIDIA’s Grace CPU and Hopper GPU, offering high bandwidth memory for efficient AI inference processing, aimed at reducing costs and increasing performance.

  • 8-5. Deepseek v2 [AI Model]

  • Deepseek v2 is an advanced Mixture of Experts model with 236 billion parameters, optimized for tasks requiring significant computational efficiency and providing impressive benchmark performances.

  • 8-6. AMD Ryzen Threadripper 7980X [CPU]

  • The AMD Ryzen Threadripper 7980X, with up to 64 cores, is designed for high-load professional environments, offering unparalleled performance in data-intensive tasks and AI inferencing.

9. Conclusion

  • The report highlights pivotal developments in AI and high-performance computing, showcasing the competitive landscape and technological innovations that are driving the industry forward. Entities like AMD and NVIDIA continue to push the boundaries in AI hardware and software, impacting various sectors and leading the transformation towards next-generation computing solutions.

10. Source Documents