AI development on the Copilot+ computer? Not yet

Updated 6 months ago on May 09, 2024

AI development on Copilot+ PC

Microsoft and its hardware partners recently released Copilot+ PCs powered by Arm processors with embedded neural processors. They represent an interesting departure from previous mainstream x64 platforms, originally centered around Qualcomm's Snapdragon X Arm processors and running Microsoft's latest builds of Windows on Arm. Buy one now, and it will already be running build 24H2 of Windows 11, at least a couple months before 24H2 arrives on other hardware.

Out of the box, the Copilot+ is a fast PC with all the features we've come to expect from a modern laptop. Battery life is excellent, and Arm-based benchmarks are as good as, or in some cases better than, most Intel or AMD-based hardware. They even outperform Apple's M2 and M3 Arm processors. This makes them ideal for most development tasks using Visual Studio and Visual Studio Code. Both processors have Arm64 builds, so they don't have to overcome the added complexity of emulating the Prism layer in Windows On Arm.

Arm-based development PC

Using GitHub or another version control system for code management, developers working on Arm versions of applications can quickly clone a repository, create a new branch, build, test, and make local changes, and then move their branch to the main repository to use change merge requests. This approach should speed up development of Arm versions of existing applications, since hardware is now part of the software development lifecycle.

Truth be told, it's not too different from previous Windows On Arm hardware. If that's all you need, the new generation of hardware simply provides a broader set of sources. If you have a purchase agreement with Dell, HP, or Lenovo, you can quickly add Arm hardware to your fleet and you won't be tied to using Microsoft's Surface.

The most interesting feature of the new devices is the embedded Neural Processing Unit (NPU). Offering at least 40 Tflops of additional computational capability, the NPU brings advanced local inference capabilities to the PC, supporting small language models and other machine learning functions. Microsoft is initially demonstrating these with a tool for captioning and selecting various real-time video filters in the device's camera processing path. (The planned Recall AI indexing tool is being redesigned to address security concerns).

Create your own AI on AI hardware

The bundled AI applications are interesting and potentially useful, but they may be better viewed as pointers to the capabilities of the hardware. As always, Microsoft is counting on developers to create more sophisticated applications that can push the hardware to its limits. That's exactly what the Copilot Runtime is designed to do, with support for the ONNX inference runtime environment and, if not in the Windows release, the DirectML inferencing API for Copilot+ PCs and their Qualcomm NPUs.

While DirectML support will make it easier to build and run AI applications, Microsoft has already started to ship some of the necessary tools to build your own AI applications. However, don't expect everything to be easy, as many elements are still missing, making the AI development workflow difficult.

Where to start. The most obvious place is the AI Toolkit for Visual Studio Code . It's designed to help you try out and customize small language models that can run on PCs and laptops using CPU, GPU, and NPU. The latest builds support Arm64, so you can install AI Toolkit and Visual Studio Code on your development devices.

Working with AI Toolkit for Visual Studio

Installation is quick, using the built-in Marketplace tools. If you plan to build AI applications, it's worth installing tools for working with Python and C#, as well as tools for connecting to GitHub or other source code repositories. Other useful features worth adding include Azure support and the necessary extensions to work with the Windows Subsystem for Linux (WSL).

Once installed, you can use the AI Toolkit to evaluate a library of small language models designed to run on PCs and hardware. Five models are currently available: four different versions of Microsoft's own Phi-3 and an instance of Mistral 7b. All are downloaded locally, and you can use the AI Toolkit model playground to experiment with contextual instructions and custom prompts.

Unfortunately, the model's playground doesn't use NPU, so you won't be able to get a feel for how the model will perform on NPU. Even so, it is useful to experiment with developing a context for your application and see how the model responds to user input. It would be nice to be able to build a more fully functional application based on this model - for example, use Prompt Flow or a similar AI orchestration tool to experiment with basing a small language model on your own data.

Don't expect to be able to fine tune the model on the Copilot+ PCs. They meet most requirements, support correct Arm64 WSL Ubuntu builds, but the Qualcomm hardware does not include an Nvidia GPU. Its NPU is for inference only, so it doesn't provide the capabilities needed by the fine-tuning algorithms.

This doesn't stop you from using the Arm appliance in the fine-tuning process, as it can still be used with a cloud-hosted virtual machine that has access to an entire or partial GPU. Microsoft Dev Box and GitHub Codespaces have GPU-enabled virtual machines, but they can be expensive if you're doing a lot of work. Alternatively, you can use a PC with an Nvidia GPU if you are working with sensitive data.

Once you have a model that you are happy with, you can start creating it in the application. This is where there is a big hole in the AI development workflow for Copilot+ PC, as you can't go directly from the AI Toolkit to edit code. Instead, find a hidden directory that stores a local copy of the model you've been testing (or download a customized version from the fine-tuning service of your choice), install an ONNX runtime environment that supports the PC NPU, and use it to start creating and testing code.

Creating an AI runtime environment for Qualcomm NPUs

While you can build the Arm ONNX environment from source, all the necessary pieces are already available, so all you need to do is build your own runtime environment. The AI Toolkit includes a basic web server endpoint for the loaded model, and you can use it with tools such as Postman to see how it handles REST inputs and outputs as if you were using it in a web application.

If you prefer to create your own code, there is a build of Python 3 for Windows on the Arm64 platform, as well as a pre-built version of the ONNX runtime provider for Qualcomm's QNN NPU. This will allow you to create and test Python code from Visual Studio Code after you validate your model using CPU inference inside the AI Toolkit. While this isn't a perfect approach, it gives you the ability to use the Copilot+ PC as your AI development environment. You can even use it along with the Python version of Microsoft's Semantic Kernel AI agent orchestration system.

C# developers have not been left out. A build of the QNN ONNX tool for .NET is available on NuGet, so you can quickly take local models and incorporate them into your code. You can use AI Toolkit and Python to validate models before implementing them in .NET applications.

It is important to understand the limitations of the QNN ONNX tool. It is designed for quantized models only, and this requires that all models you use be quantized to use 8- or 16-bit integers. Before using an off-the-shelf model, you should read the documentation to see if you need to make any changes before incorporating it into your applications.

So close, but so far away

Although the Copilot+ PC platform (and the associated Copilot Runtime) shows great promise, the toolkit is still fragmented. Currently, it is difficult to go from model to code to application without leaving the IDE. However, one can envision how a future release of the AI Toolkit for Visual Studio Code could pull together the QNN ONNX actuators and make them available for use via DirectML when developing .NET applications.

This future release should be released sooner rather than later, as devices are already in the hands of developers. Moving AI inference to on-premises devices is an important step towards reducing the load on Azure data centers.

Yes, the current state of AI development for Arm64 on Windows is frustrating, but that's more because you can see what it could be than because of a lack of tools. Many of the necessary elements are already in place; what is needed is a way to combine them to create a comprehensive AI application development platform to get the most out of the hardware.

For now, it's probably best to stick with Copilot Runtime and the built-in Phi-Silica model with its off-the-shelf APIs. After all, I bought one of the new Arm-based Surface laptops and I want it to live up to its expectations as the AI development hardware I was hoping to use. Hopefully Microsoft (and Qualcomm) will fill in the gaps and give me the NPU coding experience I want.

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mailrequest@nosota.com / Give us a call over skypenosota.skype