
Rachel Berkowitz of IEEE Spectrum reports that Cornelis Networks has introduced the CN500 networking fabric, a new architecture designed to optimize AI and supercomputing performance by efficiently coordinating up to 500,000 processors with minimal latency. Unlike Ethernet and InfiniBand, CN500 is specifically designed for large-scale parallel computing, ensuring congestion-free data transfer and higher message rates. Its advanced routing and memory control technologies reduce communication delays and system failures during AI training. As demand for massive AI models grows, Cornelis aims to provide the high-speed, scalable infrastructure essential for efficient, next-generation computing. They write:
In the good old days, networks were all about connecting a small number of local computers. But times have changed. In an AI-dominated world, the trick is coordinating the activity of tens of thousands of servers to train a large language model—without any delay in communication. Now there’s an architecture optimized to do just that. Cornelis Networks says its CN500 networking fabric maximizes AI performance, supporting deployments with up to 500,000 computers or processors—an order of magnitude higher than today—and no added latency.
The new technology brings a third major product to the networking world, along with Ethernet and InfiniBand. It’s designed to enable AI- and high-performance computers (HPC, or supercomputers) to achieve faster and more predictable completion times with greater efficiency. For HPC, Cornelis claims its technology outperforms InfiniBand NDR—the version introduced in 2022— passing twice as many messages per second and with 35 percent less latency. For AI applications, it delivers six-fold faster communication compared to Ethernet-based protocols. […]
Physically, the CN5000 product is a network card built around a custom chip. The network cards plug into every server, “like you plug an Ethernet card into your PC at home,” explains Murphy. A top-of-rack switch is cabled to each server and to other switches, and a director-class switch comes with 48 or 576 ports to link to the rack switches. “Each server has cards plugged in, so you can build multi-thousand endpoint clusters,” says Murphy. […]
Until recently, training a neural network model was a one-time deal. But now, training a multitrillion-parameter AI model means repeatedly fine-tuning or updating. Cornelis expects to take advantage of that. “If you don’t adopt AI, you’re going out of business. If you use AI inefficiently, you’ll still go out of business,” Murphy says. “Our customers want to adopt AI in the most efficient way possible.”
Read more here.