InfinibandfiniBand Network Configuration

The configuration of an InfiniBand network involves a variety of topologies and devices. Fat Tree and Dragonfly topologies are commonly used, while Host Channel Adapters (HCAs), switches, and cables are used as devices. The Subnet Manager (SM) plays an important role in efficiently managing these devices.

**The Subnet Manager (SM) is a key component in an InfiniBand network and is responsible for managing and controlling the topology of the entire network. Proper functioning of the subnet manager allows nodes in an InfiniBand network to communicate with each other efficiently.

Role of the Subnet Manager

Topology Discovery and Construction

  • The subnet manager discovers all the nodes and switches connected to the InfiniBand network and builds its physical topology.
  • It understands the relationships of each device (node, switch, etc.) and how they are connected.

Assign a Local Identifier (LID)

  • Assign each node on an InfiniBand network an identification number, LID (Local Identifier), which is used as the source and destination addresses of packets.
  • The LID allows nodes on the network to accurately transfer data to each other.

Routing Configuration

Create a routing table between each node to determine the best route for data packets.

  • By selecting the optimal routing, network bandwidth is effectively utilized and bottlenecks are minimized.

Network Monitoring

The Subnet Manager monitors link status and overall network health.

In the event of a failure, it reconfigures the topology to ensure service continuity.

Subnet Manager Deployment and Operating Environment

  • Built-in subnet manager: Some InfiniBand switches and HCAs have a built-in subnet manager. This provides network management suitable for simple, small configurations.
  • Standalone Subnet Manager: For larger environments, subnet manager software is installed on a dedicated server. This provides more granular control over complex network configurations.

Major Subnet Manager Software

OpenSM (Open Source)

OpenSM is an open source subnet manager that is widely used in InfiniBand networks.

It is freely available and runs on Linux. It is useful for managing small InfiniBand networks.

Unified Fabric Manager (UFM)

**UFM is a commercial subnet manager provided by NVIDIA (formerly Mellanox).

UFM provides network monitoring, optimization, and performance management capabilities and is used in large HPC clusters and data centers.

Partition Key (PKey) Virtualization Support

Subnet Manager also manages PKeys (Partition Keys) to virtually partition the network.

PKeys allow multiple users and applications to share the same InfiniBand network and still provide a secure, isolated environment.

Redundancy and Failover

Subnet Manager redundancy is used to improve the availability of the InfiniBand network. By using multiple SMs in the network, if one SM fails, the other SMs automatically take over and keep the network running.
network. This minimizes service interruptions.

Starting and Operating the Subnet Manager

Subnet Manager is started when the network is initialized. At startup, it scans the entire network, collects topology information, assigns LIDs to each device, and configures routing. This ensures that each node can accurately communicate with other nodes.

The CLI (command line interface) or GUI (graphical user interface) can be used to configure the subnet manager, view topology information, and monitor the network.

Conclusion

From open source tools such as OpenSM to commercial tools with advanced management features such as NVIDIA's UFM, depending on the size and requirements of your network, you can choose the appropriate SM for your network size and requirements.

Translated with AI.

Follow me!

Leave a Reply

Your email address will not be published. Required fields are marked *