Modern networking platform built for Distributed AI
ACE-AI delivers a unified fabric across the network for Distributed AI, from Datacenter to Edge to Multi-cloud
ACE-AI employs IP CLOs and Virtual Distributed Router (VDR) architectures for scalable GPU connectivity. It ensures high performance and lossless connectivity through RoCEv2 support, Priority Flow Control (PFC), and Adaptive Routing, all while maintaining low latency and high availability.
ACE-AI supports SmartNICs like BlueField3, enhancing inferencing capabilities at the Edge. This setup facilitates security, traffic engineering, and efficient multi-cloud networking for smooth model operations.
ACE-AI provides seamless access to AI workloads across various locations. Its Egress Cost Control (ECC) reduces costs associated with large transfers of AI data, optimizing resource use across clouds.
Distributed AI offers significant computational efficiency, scalability, security, and latency benefits.
AI workloads are increasingly distributed. Examples of how AI is being distributed across the network include Distributed Model Training, where AI/ML models are trained on multiple nodes within the network, enhancing efficiency and performance for large, complex models. Federated Learning is another approach, where AI/ML models are trained on data distributed across the network and multiple device types, including smartphones, tablets, and wearables. Additionally, Inferencing at the Edge involves deploying inferencing models at the edge of the network, closest to end users, which reduces latency and improves performance for applications. Key requirements for networks supporting distributed AI include high performance and lossless connectivity, predictable latency, high availability and resiliency with zero impact failover, and fabric-wide visibility.