What is a HGX server?
HGX is a design framework created by NVIDIA. An HGX server is a system that adheres to the NVIDIA HGX reference architecture, specifically structured to offer a versatile and scalable platform tailored for data centers managing demanding AI and HPC applications. It enables the integration of NVIDIA GPUs and other hardware components.
What are the 7 best key features of an HGX server?
- NVIDIA GPU support
The primary purpose of an HGX server is to support NVIDIA GPUs, which are widely used for AI and HPC workloads. The server is designed to accommodate multiple GPUs for parallel processing tasks. - NVLink Interconnect
HGX servers may incorporate NVIDIA's NVLink technology, a high-speed interconnect that enables fast communication between GPUs. NVLink enhances data transfer rates and enables improved performance for parallel computing workloads. - Modular design
This server typically features a modular design that allows for flexibility in configuring hardware components. This modularity enables data centers to customize the server based on their specific requirements, including the type and number of GPUs. - Scalability
Designed to be scalable, HGX servers allow data centers to build powerful computing clusters by connecting multiple HGX-based servers. This scalability is crucial for handling the increasing demands of AI and HPC applications. - Industry standardization
HGX is designed as a reference architecture, aiming for industry standardization. This means that the design is provided by NVIDIA for collaboration with various hardware vendors. Partners, such as 2CRSi, can implement the HGX design. - Software compatibility
HGX servers are intended to be compatible with various software frameworks and tools used in the AI and HPC fields. This ensures that the servers can integrate into existing workflows and support a wide range of applications. - Versatility in applications
The modular design and compatibility with different GPUs make HGX servers versatile, capable of supporting a variety of workloads ranging from deep learning and AI training to scientific simulations and data analytics.
HGX server, based on SXM5 platform
The main characteristic of the SXM socket lies in its design, which integrates both the processor and memory into the server.
Our HGX servers, the Godì 1.8 range, is based on the latest SXM platform, namely, the SXM5.
The SXM5 is produced using TSMC's 4nm process and was launched in March 2023, built upon NVIDIA's Hopper architecture. It incorporates 80 billion transistors, 16,896 CUDA cores, and 80GB of HBM3 memory, along with a 50MB L2 cache. It boasts a theoretical performance of 66.91 TFLOPS and consumes a total of 700W.
PCIe vs. SXM5
NVIDIA H100/H200 GPUs come in two form factors: PCIe and SXM5.
PCIe GPUs can be inserted into standard PCIe slots on a motherboard, while SXM5 GPUs necessitate a distinct form factor that doesn't align with standard PCIe slots.
Moreover, replacing or upgrading PCIe GPUs is rather simple, as it involves only the removal of the GPU. In contrast, SXM5 GPUs are attached to the chassis using thermal paste, making replacement trickier.
However, owing to NVSwitch, SXM5 GPUs deliver superior performance compared to PCIe GPUs, rendering them well-suited for data-intensive workloads across multiple nodes.
2CRSi HGX H100/H200 Servers
H100/H200 Applications
AI/ML
HPC