“AI training is an HPC-like problem,” said Microsoft’s Nidhi Chappell at ISC 2024 in May. The convergence of high-performance computing (HPC), artificial intelligence, and machine learning continues, and is the driver of a renewed thirst for HPC infrastructure and services. TBT Marketing was in attendance in Hamburg to find out more. 

Last month, we joined more than 3,500 HPC practitioners, scientists and researchers to learn more about the current trends and developments in HPC. With AI dominating the conversation, hot topics included which GPU technologies work best for customer workloads, and how new users can be onboarded to HPC and AI facilities quickly and easily to take advantage of the need to enhance results with AI. With all that demand, however, there remains a challenge around the facilities housing HPC infrastructure. A lot of attention is being paid to this by enterprises, as customers look to the costs and benefits of hosting an HPC or AI cluster in dedicated data centres equipped with the latest sustainable power and cooling technologies. 

Reinventing HPC 

The theme for ISC in the last few years has been around the ‘reinvention’ of HPC, moving beyond the traditional academic niches into mainstream research and development in commercial as well as academic sectors. As we’ve seen with the rise in user demand for generative AI applications, businesses and other organisations benefit from integrating AI into their R&D, from automotive and aerospace, to life sciences and financial services. Massive computational requirements for today’s demanding workloads give HPC a new limelight with the AI race so hot at the moment. For training, HPC infrastructure includes powerful GPUs and high-speed interconnects, essential for the efficient model training that everyone’s trying to do. A good example is OpenAI’s Kubernetes HPC cluster that was used to train ChatGPT, or the more recent training of AlphaFold on DeepMind’s HPC systems. 

Parallel deployment is critical for success in AI

There’s no getting away from the fact all this AI development hinges on a huge number of GPUs that have been tailored to make AI workloads sing. And as GPUs are in such high demand at the moment, almost everyone has to wait in line for the latest and greatest. So, when it comes to the data centre, the issue of parallel deployment was spoken of a number of times at ISC. Businesses and institutions cannot afford to invest too much in the data centre while they wait for the GPUs, and you can’t get the GPUs without having everything else in place or you’re just wasting money. Getting that balancing act right will ensure that TCO is lowered and workloads can get going quickly. Keeping pace with what HPC practitioners need, and what customers are demanding, only works if organisations can scale quickly and efficiently.  In response to the scarcity issue, a recent survey points out that 52% of respondents indicated looking for cost-effective alternatives to GPUs for inference-based compute. That’s likely to remain a central theme in the data centre conversation for the foreseeable future as the pace of AI training and inference shows little sign of slowing.

At TBT Marketing, we’ve been at the forefront of marketing for some of the most ambitious technology companies in the world for the last 25 years. As we ride the AI wave we’re right there with them again. Get in touch for more details and follow us on LinkedIn for our latest news and insights.