Meta has completed the first phase of a new AI supercomputer. Once the AI Research SuperCluster (RSC) is fully built later this year, the company believes it will be the fastest AI supercomputer on the planet, capable of “operating at nearly 5 exaflops of precision computing mixed”.
The company says RSC will help researchers develop better AI models that can learn from billions of examples. Among other things, the models will be able to build better augmented reality tools and “seamlessly analyze text, images, and video together,” according to Meta. Much of this work is in service of his vision for the metaverse, in which he says AI-powered apps and products will have a key role.
“We hope RSC will help us build entirely new AI systems that can, for example, provide real-time voice translations to large groups of people, each speaking a different language, so they can collaborate in a meaningful way. about a research project or playing an AR game together,” technical program manager Kevin Lee and software engineer Shubho Sengupta wrote in a blog post.
RSC currently has 760 Nvidia DGX A100 systems with a total of 6,080 GPUs. Meta believes the current iteration is already among the fastest AI supercomputers on the planet. Based on early benchmarks, he claims that RSC can run computer vision workflows up to 20x faster and the NVIDIA collective communication library more than nine times faster.
Meta says RSC can also train large-scale natural language processing models three times faster. Thus, AI models that determine whether “an action, sound or image is harmful or benign” (for example, to eradicate hate speech) can be trained faster. According to the company, this research will help protect people on current services like Facebook and Instagram, as well as in the metaverse.
In addition to building the physical infrastructure and systems to run RSC, Meta said it needs to ensure security and privacy controls are in place to protect the actual training data it uses. He says that by using real-world data from his production systems, instead of publicly available datasets, he can put his research to more effective use, for example, by identifying harmful content.
This year, Meta plans to increase the number of GPUs in RSC to 16,000. It says this will increase AI training performance by more than 2.5 times. The company, which began work on the project in early 2020, wanted RSC to train AI models on datasets up to an exabyte in size (the equivalent of 36,000 years of high quality video).
“We expect that such a step-by-step change in computing capacity will not only allow us to create more accurate AI models for our existing services, but also enable completely new user experiences, especially in the metaverse,” Lee and Sengupta wrote.