Meet Web Stable Diffusion: an AI project that brings stable diffusion models to web browsers

Updated 2 years ago on July 11, 2023

Recently, artificial intelligence (AI) models have shown significant improvement. The open-source movement has made it easier for programmers to combine different open-source models to create new applications.

Stable diffusion allows automatic generation of photorealistic and other image styles based on textual input. Because these models tend to be large and computationally intensive, when building web applications that utilize them, all computation required ta be shifted to (GPU) servers. In addition, most workloads require a specific family of GPUs on which popular deep learning frameworks can run.

The Machine Learning Compilation (MLC) team presents the project as an attempt to change the current situation and increase the biodiversity of the environment. In their opinion, moving computing to the client can provide many advantages, such as lower provider costs, more personalized approach and security.

According to the team, ML models should be able to be delivered on-premises without the necessary GPU-accelerating Python frameworks. AI frameworks typically rely heavily on optimized compute libraries from hardware vendors. Therefore, backups are important to start over. To maximize impact, unique variants need to be generated tailored to each customer's specific infrastructure.

The proposed web-stable diffusion directly places a conventional diffusion model in the browser and runs directly through the client GPU on the user's laptop. All actions are performed locally in the browser and do not affect the server. According to the team, this is the world's first browser-based stable diffusion.

Machine learning compilation (MLC) technology plays a central role here. PyTorch, Hugging Face diffusers and tokenizers, rust, wasm and WebGPU are some of the open source technologies on which the proposed solution is based. Apache TVM Unity, an interesting development within Apache TVM, is the foundation on which the mainstream is built.

The team used Runway stable diffusion models v1-5 of the Hugging Face diffuser library.

The key components of the model are captured in the IRM module in TVM using TorchDynamo and Torch FX. The IRM module of TVM can generate executable code for each function, allowing them to be deployed in any environment capable of performing at least minimal TVM runtime (javascript being one of them).

They use TensorIR and MetaSchedule to create scripts that automatically generate efficient code. These transformations are tuned locally to generate optimized GPU shaders that use the device's native GPU modes. They provide a repository for these customizations, allowing future builds to be created without fine-tuning.

They build static memory scheduling optimizations to optimize memory reuse at multiple levels. The TVM web program uses emscripten and typescript to facilitate the deployment of generating modules.

They also use the wasm port of the hugging face rust tokenizer library.

With the exception of the final stage, where a 400 JavaScript application is created to tie everything together, the entire workflow is done in Python. The introduction of new models is an interesting by-product of this collaborative development.

All of this is possible thanks to the open source community. In particular, the team relies on TVM Unity, the most recent and exciting addition to the TVM project, which provides such an interactive Python-based MLC development experience, allowing additional Python optimizations to be created and the application to be gradually released to the web. TVM Unity also facilitates the rapid creation of new solutions for the ecosystem.

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mailrequest@nosota.com / Give us a call over skypenosota.skype