Menu
Cisco sets a foundation for AI network infrastructure

Cisco sets a foundation for AI network infrastructure

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

Credit: Cisco

Cisco is taking the wraps off new high-end programmable Silicon One processors aimed at underpinning large-scale Artificial Intelligence (AI)/Machine Learning (ML) infrastructure for enterprises and hyperscalers.

The company has added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its now 13-member Silicon One family that can be customised for routing or switching  from a single chipset, eliminating the need for different silicon architectures for each network function. This is accomplished with a common operating system, P4 programmable forwarding code, and an SDK.

The new devices, positioned at the top of the Silicon One family, bring networking enhancements that make them ideal for demanding AI/ML deployments or other highly distributed applications, according to Rakesh Chopra, a Cisco Fellow in the vendor’s Common Hardware Group.

“We are going through this huge shift in the industry where we used to build these sorts of reasonably small high-performance compute clusters that seemed large at the time but nothing compared to the absolutely huge deployments required for AI/ML,” Chopra said. AI/ML models have grown from needing a few GPUs to needing tens of thousands linked in parallel and in series. “The number of GPUs and the scale of the network is unheard of.”

The new Silcon One enhancements include a P4-programmable parallel-packet processor capable of launching more than 435 billion lookups per second.

“We have a fully shared packet buffer where every port has full access to the packet buffer regardless of what’s going on,” Chopra said. This is in contrast with allocating buffers to individual input and output ports, which means the buffer you get depends on which port the packets go to. “That means that you’re less capable of writing through traffic bursts and more likely to drop a packets which really decreases AI/ML performance,” he said.

In addition, each silicon device can support 512 Ethernet ports letting customers build a 32K 400G GPU AI/ML cluster requiring 40% fewer switches than other silicon devices needed to support that cluster, Chopra said.

Core to the Silicon One system is its support for enhanced Ethernet features such as improved flow control, congestion awareness, and  avoidance.

The system also includes advanced load-balancing capabilities and “packet-spraying” that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link-failure recovery also helps ensure the network operates at peak efficiency, the company stated.

Combining these enhanced Ethernet technologies and taking them a step further ultimately lets customers set up what Cisco calls a fully Scheduled Fabric. 

In a Scheduled Fabric, the physical components—chips, optics, switches—are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior, Chopra said. "Ultimately what it translates to is much higher bandwidth throughput, especially for flows like AI/ML, which lets you get much lower job-completion time, which means that your GPUs run much more efficiently.”

With Silicon One devices and software, customers can deploy as many or as few of these features as they need, Chopra said.

Cisco is part of a growing AI networking market that includes Broadcom, Marvell, Arista and others that is expected to hit $10B by 2027, up from the $2B it is worth today, according to a recent blog from the 650 Group.

“AI networks have already been thriving for the past two years. In fact, we have been tracking AI/ML networking for nearly two years and see AI/ML as a massive opportunity for networking and one of the main drivers for data-center networking growth in our forecasts,” the 650 blog stated. “The key to AI/ML’s impact on networking is the tremendous amount of bandwidth AI models need to train, new workloads, and the powerful inference solutions that appear in the market. In addition, many verticals will go through multiple digitisation efforts because of AI during the next 10 years.”

The Cisco Silicon One G200 and G202 are being tested by unidentified customers now and are available on a sampled basis, according to Chopra.  


Follow Us

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Events

EDGE 2024

Register your interest now for EDGE 2024!

Featured

Slideshows

How MSPs can capitalise on integrating AI into existing services

How MSPs can capitalise on integrating AI into existing services

​Given the pace of change, scale of digitalisation and evolution of generative AI, partners must get ahead of the trends to capture the best use of innovative AI solutions to develop new service opportunities. For MSPs, integrating AI capabilities into existing service portfolios can unlock enhancements in key areas including managed hosting, cloud computing and data centre management. This exclusive Reseller News roundtable in association with rhipe, a Crayon company and VMware, focused on how partners can integrate generative AI solutions into existing service offerings and unlocking new revenue streams.

How MSPs can capitalise on integrating AI into existing services
Access4 holds inaugural A/NZ Annual Conference

Access4 holds inaugural A/NZ Annual Conference

​Access4 held its inaugural Annual Conference in Port Douglass, Queensland, for Australia and New Zealand from 9-11 October, hosting partners from across the region with presentations on Access4 product updates, its 2023 Partner of the Year awards and more.

Access4 holds inaugural A/NZ Annual Conference
Show Comments