NVIDIA Mellanox MQM8790-HS2F InfiniBand Switch Technical Reference
May 27, 2026
This technical white paper is intended for network architects, pre‑sales engineers, and operations leads. Centered on the NVIDIA Mellanox MQM8790-HS2F InfiniBand switch, it provides a detailed reference for designing, deploying, and operating high‑performance fabrics that serve large‑scale AI training and HPC simulations — with a focus on RDMA acceleration and deterministic low‑latency communication.
Modern AI and HPC workloads are driving GPU clusters toward tens of thousands of nodes. Traditional Ethernet fabrics, even with RoCE, exhibit performance inconsistencies under incast patterns and suffer from higher CPU overhead. Key requirements for next‑generation cluster interconnects include:
- Sub‑microsecond point‑to‑point latency for synchronized collective operations
- Lossless, congestion‑free transport to prevent tail‑timeout stragglers
- Full offload of communication processing from CPU/GPU cores
- Scalable fat‑tree or dragonfly+ topologies with non‑blocking bandwidth
- Cost‑effective migration path from existing 100G/200G optics
Addressing these demands, the MQM8790-HS2F InfiniBand switch delivers 200Gb/s HDR per port, 40 QSFP56 ports per unit, and native support for RDMA and in‑network computing. This solution is purpose‑built for environments where interconnect efficiency directly translates to job completion time and TCO.
The recommended physical topology for AI/HPC clusters using the NVIDIA Mellanox MQM8790-HS2F is a two‑tier or three‑tier fat‑tree (also known as leaf‑spine). Each leaf switch connects to GPU servers via HDR or HDR100 links, while spine switches provide full‑mesh connectivity between leaves. A typical 800‑GPU configuration employs:
- 20 leaf switches: each MQM8790-HS2F 200Gb/s HDR 40-port QSFP56 serving up to 20 servers (dual‑rail connections)
- 4 spine switches: fully interconnected with leaves using 200Gb/s uplinks
- Non‑blocking ratio of 1:1 at each tier
For larger clusters (2,000+ GPUs), a three‑tier architecture adds a super‑spine layer. All fabric management is handled by the Subnet Manager (SM) — either running on a dedicated controller or embedded in the switch firmware for smaller fabrics.
Within this architecture, the MQM8790-HS2F serves as both leaf and spine element. Its core differentiators include:
- 200Gb/s HDR per port: Full bidirectional throughput with 16Tb/s aggregate non‑blocking switching capacity
- Adaptive routing (AR): Dynamically distributes traffic across multiple paths to avoid hotspots, critical for fat‑tree efficiency
- SHARPv2 (Scalable Hierarchical Aggregation and Reduction Protocol): Offloads collective operations (All‑Reduce, Reduce‑Scatter) directly into the switch network, reducing GPU idle time
- Congestion control: Fine‑grained flow control with FECN/BECN mechanisms ensures lossless fabric
- High port density: 40 QSFP56 ports per 1U chassis, simplifying rack layout and reducing inter‑rack cabling complexity
Engineers can refer to the MQM8790-HS2F datasheet and MQM8790-HS2F specifications for detailed power, thermal, and latency numbers. The platform is also MQM8790-HS2F compatible with a wide range of HDR, HDR100, and even EDR optics, enabling gradual upgrades.
For a balanced 1,000‑GPU AI cluster, the following deployment sequence is recommended:
- Step 1 – Leaf layer: Install 25x MQM8790-HS2F units as leaf switches. Connect each GPU server (8 GPUs per server) via two HDR ports to two different leaves for redundancy and bandwidth.
- Step 2 – Spine layer: Deploy 8x NVIDIA Mellanox MQM8790-HS2F as spine switches. Connect every leaf to every spine using 200Gb/s links (40 uplinks per leaf, but typically a subset sufficient to maintain blocking ratio ≤1:1).
- Step 3 – Subnet Manager placement: Run redundant SM instances on two lightweight servers or use the embedded SM for clusters under 2,000 ports.
- Step 4 – Partitioning: Use InfiniBand partitions to isolate production, development, and storage traffic flows over the same physical fabric.
For expansion to 2,500+ GPUs, add a super‑spine layer using additional MQM8790-HS2F units and increase leaf‑to‑spine uplink density. The MQM8790-HS2F InfiniBand switch solution scales linearly without re‑architecting the core.
Daily operation of a fabric built on MQM8790-HS2F switches is managed through NVIDIA’s Fabric Manager and MLNX‑OS. Key practices include:
- Monitoring: Use `ibdiagnet` for topology validation, `perfquery` for port counters, and Grafana dashboards with Prometheus exporters for real‑time telemetry (link utilization, symbol errors, congestion markers).
- Firmware & software: Maintain consistent firmware across all switches. Refer to the latest release notes in the MQM8790-HS2F datasheet for version compatibility.
- Troubleshooting: Common issues—link flapping, mismatched MTU, or partition misconfiguration—are quickly isolated using `ibstatus`, `smpquery`, and switch syslog. Adaptive routing logs can pinpoint congestion sources.
- Optimization: Tune adaptive routing thresholds and congestion control parameters based on workload profiling. For large‑scale AI training, enable SHARP on applicable MPI libraries (e.g., NCCL with SHARP plugin).
When planning budget or expansion, request MQM8790-HS2F price quotes from authorized partners and verify MQM8790-HS2F for sale availability. The platform’s compatibility with existing QSFP56 cabling reduces procurement risk.
The MQM8790-HS2F is not just a switch — it is a foundational building block for deterministic, low‑latency HPC and AI fabrics. Its combination of 200Gb/s HDR, 40‑port density, adaptive routing, and SHARP in‑network compute directly addresses the key pain points of modern cluster interconnects. Compared to alternative 200G Ethernet solutions, the InfiniBand ecosystem offers:
- Lower CPU overhead (true RDMA with hardware transport offload)
- Deterministic microsecond‑scale latency under real load
- Simplified fabric management with centralized Subnet Manager
- Proven scalability to tens of thousands of endpoints
For organizations building or expanding GPU clusters, adopting the NVIDIA Mellanox MQM8790-HS2F means achieving higher effective FLOPS per dollar, reducing job completion time, and future‑proofing the interconnect for next‑generation HDR200 (400Gb/s) upgrades. Detailed reference designs, including complete MQM8790-HS2F specifications and cabling matrices, are available through the NVIDIA partner portal.

