PRESS

Stability AI launches SVD 1.1, a diffusion model for more consistent AI videos

Updated 2 years ago on June 16, 2024

Get in Touch

Table of Contents

What to expect from Stability AI's SVD 1.1?
Real-life videos of artificial intelligence remain to be seen

Stability AI, known for its growing suite of open source AI models for content creation and encoding, today announced an update to its stealthy image-to-video diffusion model, Stable Video Diffusion (SVD).

The updated model, dubbed SVD 1.1, is a refined version of SVD 1.0, optimized for creating short AI videos with better fluidity and consistency.

In a post announcing the update, Tom Mason, CTO of Stability AI, confirmed that the new model is available for public use and can be downloaded via Hugging Face.

He also noted that the model will be made available as part of the Stability subscription, which has different tiers for individual and enterprise users, including free, $20 per month and higher. For users wishing to deploy the new SVD 1.1 version for commercial purposes, a membership will be required.

What to expect from Stability AI's SVD 1.1?

Back in November 2023, Stability released two models for AI video: the SVD and SVD-XT. The first was a basic model that took a still image as a source frame and generated a four-second video with 14 frames from it, and the second was an advanced version that worked in a similar way but generated up to 25 frames.

Now, having finalized the SVD-XT, Stability has introduced SVD 1.1. This model, the company says, also generates four-second videos with 25 frames, but at 1024×576 resolution with a context frame of the same size.

Moreover, this upgrade is expected to provide a more stable video output compared to the original model.

For example, in many cases, SVD and SVD-XT did not provide photorealism, produced videos with no motion or very slow camera rotations, and did not create faces and people as expected by users. All of this is expected to disappear with the release of SVD 1.1, which promises to achieve better motion capture in the output.

"Fine tuning (for SVD 1.1) was done with fixed conditions at 6FPS and motion bucket Id 127 to improve consistency of results without the need to adjust hyperparameters. These conditions are still customizable and have not been removed. Performance outside of fixed conditioning settings may vary compared to SVD 1.0," the company notes on the new Hugging Face mode page.

Real-life videos of artificial intelligence remain to be seen

While Stability claims improved SVD 1.1 performance, it remains to be seen exactly how this works in practice. The Hugging Face page for the model notes that it is for research purposes, and warns that some of the initial issues may still manifest themselves.

Notably, in addition to Hugging Face, Stable Video Diffusion models can also be utilized through the API available on the Stability AI developer platform. This gives developers the ability to easily integrate advanced video generation into their products.

"...We've released the Stable Video Diffusion API, which generates 4 seconds of video at 24 frames per second in MP4 format, including 25 generated frames and the remaining interpolated frames. We support motion force control and multiple layout and resolution options including 1024×576, 768×768, and 576×1024," Mason said in a statement.

Last year, Stability AI raised the bar for generative AI through frequent model releases. 2024 appears to be going down that path as well. The company was founded in 2019 and has raised significant funding, including a $101 million round announced in 2022. However, it's not the only one operating in this space. Competing offerings from Runway and Pika are also gaining momentum, especially with their customer-facing web platforms that not only generate videos, but also provide the ability to easily customize them and upsell.

Recently, competitor Runway introduced Multi Motion Brush on its platform, allowing users to add motion to certain parts of their AI videos. Another AI video company, Pika, allows users to change certain areas in their videos, such as changing a cow's face to a duck's face. However, both of these platforms still don't offer their models through APIs, preventing developers from integrating them into their apps.

Get in Touch with NOSOTA

More Press

Stable Diffusion 3 API is available 2 years ago
Researchers warn developers about vulnerabilities in ChatGPT plugins 2 years ago

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mail request@nosota.com / Give us a call over skype nosota.skype