Unlocking peak generations: TensorRT accelerates AI on RTX PCs and workstations

Updated 6 months ago on July 15, 2024

As generative AI evolves and becomes more widespread across industries, the importance of running generative AI applications on local PCs and workstations is increasing. Local data output allows consumers to reduce latency, eliminate network dependency, and gain greater control over data.

NVIDIA GeForce and NVIDIA RTX GPUs feature Tensor Cores, dedicated AI hardware gas pedals that provide power for localized generative AI.

Stable Video Diffusion is now optimized for the NVIDIA TensorRT software development kit, which unlocks the power of high-performance generative AI on more than 100 million RTX GPU-powered Windows PCs and workstations.

Now the TensorRT extension for Automatic1111's popular Stable Diffusion web interface adds support for ControlNets, tools that give users more options for refining generative results by adding other images as a guide.

TensorRT acceleration can be experienced in the new UL Procyon AI Image Generation benchmark, which internal tests have shown accurately replicates real-world performance. On the GeForce RTX 4080 SUPER GPU, it showed a 50% acceleration over the fastest implementation without TensorRT.

More efficient and accurate artificial intelligence

TensorRT allows developers to access hardware that provides fully optimized AI performance. AI performance typically doubles compared to running applications on other frameworks.

It also accelerates the most popular generative AI models such as Stable Diffusion and SDXL. Stable Video Diffusion, Stability AI's generative AI model for image and video, is accelerated by 40% with TensorRT.

The optimized Stable Video Diffusion 1.1 Image-to-Video model can be downloaded from the Hugging Face website.

In addition, the TensorRT extension to the Stable Diffusion WebUI increases performance by up to 2x, which greatly simplifies Stable Diffusion workflows.

In the latest extension update, TensorRT optimization is extended to ControlNets, a set of artificial intelligence models that help guide diffusion model output by adding additional conditions. Thanks to TensorRT, ControlNets are up to 40% faster.

Users can direct aspects of the output signal to match the input image, giving them more control over the final image. In addition, multiple ControlNet networks can be utilized for even more control. A control network can be a depth map, edge map, normal map, or key point detection model, among many others.

Blackmagic Design has introduced NVIDIA TensorRT acceleration in DaVinci Resolve update 18.6. Artificial Intelligence tools such as Magic Mask, Speed Warp, and Super Scale are more than 50% faster and up to 2.3x faster on RTX GPUs compared to Macs.

In addition, with the TensorRT integration, Topaz Labs has seen performance gains of up to 60% in its Photo AI and Video AI applications - such as photo sharpening and sharpening, photo super-resolution, slow motion video, video super-resolution, video stabilization and others - all running on RTX.

The combination of tensor kernels with TensorRT software delivers unrivaled generative AI performance on local PCs and workstations. Local operation offers several advantages:

  • Performance : Users experience lower latency because with the entire model running locally, latency is independent of network quality. This can be important for real-time applications such as gaming or videoconferencing. NVIDIA RTX offers the fastest AI gas pedals, scaling to more than 1,300 trillion AI operations per second, or TOPS.
  • Cost : Users do not need to pay for cloud services, cloud application programming interfaces, or infrastructure costs to output large language models.
  • Always connected : users can access LLM capabilities anywhere without depending on high network bandwidth.
  • Data privacy : private and proprietary data can always remain on the user's device.

Optimized for LLM

What TensorRT brought to deep learning, NVIDIA TensorRT-LLM brings to the latest LLMs.

TensorRT-LLM, an open source library that accelerates and optimizes LLM output, includes support for popular community models including Phi-2, Llama2, Gemma, Mistral, and Code Llama. Anyone from developers and creators to enterprise employees and general users can experiment with TensorRT-LLM-optimized models in NVIDIA AI Foundation models. In addition, through the NVIDIA ChatRTX demo, users can see the performance of various models running locally on a Windows PC. ChatRTX is built on TensorRT-LLM to optimize performance on RTX GPUs.

NVIDIA is working with the open-source community to develop its own TensorRT-LLM connectors to popular frameworks, including LlamaIndex and LangChain.

With these innovations, developers can easily utilize TensorRT-LLM in their applications and enjoy the best LLM performance with RTX.

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mailrequest@nosota.com / Give us a call over skypenosota.skype