Google opens source code for tools to support development of artificial intelligence models

Updated 8 months ago on June 03, 2024

In a normal year, Cloud Next - one of Google's two major annual developer conferences, the other being I/O - almost exclusively showcases managed and other closed-source, closed and locked down APIs products and services. But this year, either to encourage developer goodwill or to further its ecosystem ambitions (or both), Google introduced a number of open source tools, primarily aimed at supporting generative AI projects and infrastructure.

The first, MaxDiffusion, which Google quietly released in February, is a collection of benchmark implementations of various diffusion models - such as the Stable Diffusion image generator - that run on XLA devices. "XLA" stands for Accelerated Linear Algebra, an awkward acronym for a technique that optimizes and accelerates certain types of AI workloads, including fine-tuning and maintenance.

Google's proprietary Tensor Processing Units (TPUs) are XLA devices, as are Nvidia's recent GPUs.

In addition to MaxDiffusion, Google is launching JetStream, a new engine for running generative AI models - specifically text-generating models (this is not Stable Diffusion). JetStream currently only supports TPUs, with GPU compatibility expected to come in the future. Google says JetStream offers 3x the "performance per dollar" for models like Google's Gemma 7B and Meta's Llama 2.

"As customers bring their AI workloads into production, there is a growing need for a cost-effective compute stack that delivers high performance," wrote Mark Lohmeyer, head of compute and machine learning infrastructure at Google Cloud, in a blog post provided to TechCrunch. "JetStream helps fulfill this need ... and includes optimizations for popular open source models like Llama 2 and Gemma."

A 3x improvement is a pretty loud claim, and it's not exactly clear how Google arrived at that figure. By using what generation of TPU? Compared to which base engine? And how is "performance" even defined?

I've asked Google all of these questions and will update this post if I get an answer.

Penultimate on Google's list of open source contributions are new additions to MaxText, a collection of text-generating AI models designed for cloud computing on Nvidia TPUs and GPUs. MaxText now includes Gemma 7B, GPT-3 (the predecessor to GPT-4) from OpenAI, Llama 2 and models from AI startup Mistral - all of which Google says can be customized to meet the needs of developers.

We have significantly optimized the performance of [models] on TPUs, and have also worked closely with Nvidia to optimize performance on large GPU clusters," said Lohmeyer. "These improvements maximize GPU and TPU utilization, leading to improved power efficiency and cost optimization."

Finally, Google has partnered with Hugging Face, an AI startup, to create Optimum TPU, which provides tools to port certain AI workloads to TPUs. According to Google, the goal is to lower the barrier to deploying generative AI models on TPU hardware - particularly text-generating models.

But the Optimum TPU is a bit limited in what it can do at the moment. The only model it works with is Gemma 7B. And Optimum TPU does not yet support training generative models on TPU - only running them.

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mailrequest@nosota.com / Give us a call over skypenosota.skype