A large language model, a significant factor in recent AI enthusiasm, has been incorporated into MLPerf, a collection of neural-network training benchmarks known as the machine learning Olympics. Initial tests evaluated the performance of Nvidia’s H100 GPU and Intel’s Habana Gaudi2 chips in training a modified version of GPT-3, the language model powering ChatGPT.
A collaborative effort between Nvidia and CoreWeave utilized a 3,584-GPU computer to complete the task in just under 11 minutes, while the smallest system, equipped with 256-Gaudi2, took a little over 7 hours. On a per-chip basis, H100 systems were 3.6 times faster than Gaudi2. However, Gaudi2’s performance was hindered by the absence of mixed precision capability on the chips.
The achievement by Nvidia and CoreWeave, estimated to be equivalent to about two days of full-scale training, demonstrates the impact of mixed-precision hardware in accelerating training for transformer networks like GPT-3. In September, Habana engineers are expected to enable Gaudi2’s FP8 capabilities for GPT-3 training, making it competitive with H100.
Despite the challenges in establishing GPT-3 as an industry benchmark due to the time and cost associated with complete training, MLPerf succeeded in creating a representative portion of the training dataset to assess convergence. The MLPerf competition encompassed multiple benchmark tests, including image recognition, medical imaging segmentation, object detection, speech recognition, natural language processing, and recommendation, with evaluations based on training time and accuracy.
Nvidia’s GPUs dominated the competition, with the H100 achieving records in all eight categories. MLPerf introduced an upgraded recommender system benchmark called DLRM DCN-V2 to better align with industry practices, featuring increased memory operations and computational complexity, along with a larger training dataset.
The whytry.ai article you just read is a brief synopsis; the original article can be found here: Read the Full Article…