Vector databases are beginning to play an important role in Artificial Intelligence

Updated 2 years ago on April 14, 2023

Vector databases came on the market a few years ago and became the basis for a new type of search engine based on neural networks rather than keywords. Companies like Home Depot have greatly improved their search quality with this new technology. Now vector databases are playing a new role in helping organizations implement chatbots and other applications based on large language patterns.

A vector database is a new type of database that is becoming popular in the world of machine learning and artificial intelligence. Vector databases are different from traditional relational databases such as PostgreSQL, which were originally designed to store tabular data in the form of rows and columns. They also differ significantly from newer NoSQL databases such as MongoDB, which store data as JSON documents.

This is because a vector database is designed to store and search one specific type of data - vector embeddings.

Vectors are, of course, numerical arrays representing various characteristics of an object. Vector embeddings, being the result of the training part of the machine learning process, are distilled representations of the training data. In essence, they are a filter through which new data are passed in the process of outputting the results of machine learning.

The first major application of vector databases was their use in next-generation search engines, as well as in production recommendation systems. Home Depot significantly improved the accuracy and usability of its website search engine by augmenting traditional keyword search with vector search methods. Rather than requiring a perfect keyword match (or a database filled with common misspellings of Home Depot's 2 million products), vector search allows Home Depot to utilize machine learning capabilities to determine user intent.

But now vector databases are at the center of the hottest strain in technology: large language models (LLMs) such as OpenAI's GPT-4, Facebook's LLaMA, and Google's LaMDA.

In LLM systems, a vector database can be used to store vector embeddings resulting from LLM training. By storing potentially billions of vector embeddings representing extensive LLM training, the vector database performs an important similarity search that finds the best match between a user's query (the question they ask) and a particular vector embedding.

Although relational and NoSQL databases have been modified to store vector embeddings, none of them were originally designed to store and maintain this type of data. This gives some advantage to "native" vector databases that were designed from scratch to handle vector embeddings, such as Pinecone, Zilliz and others.

Zilliz is the primary developer of Milvus, an open source vector database first released in 2019. According to the Milvus website, the database was developed in a modern cloud-based manner and is capable of providing "millisecond searches across trillions of vector data."

Last week at Nvidia's GPU Technology Conference, Zilliz announced the latest version of its Milvus 2.3 vector database. When paired with an Nvidia GPU, Milvus 2.3 can run 10 times faster than Milvus 2.0, the company said. The vector database can also run on a mix of GPUs and CPUs, which is claimed to be a first.

Nvidia also announced a new integration of its RAFT (Reusable Accelerated Functions and Tools) graph acceleration library with Milvus. Nvidia CEO Jensen Huang spoke about the importance of vector databases during his GTC keynote.

"Recommender systems use vector databases to store, index, search, and retrieve huge amounts of unstructured data," says Huang. "An important new use case for vector databases is large language models for retrieving domain-specific or proprietary facts that can be queried during text generation.... Vector databases will be important for organizations creating their own big language models."

But vector databases can also be used by organizations that are content to use pre-trained LLMs through APIs provided by technology giants, says Greg Cogan, Pinecone's vice president of marketing.

LLMs such as ChatGPT, trained on huge data sets from the Internet, have shown themselves to be very good (though not perfect) at generating relevant answers to questions. Because they were already trained, many organizations began investing in operational engineering tools and techniques to make LLMs work better for their specific use case.

GPT-4 users can query the model for up to 32,000 "tokens" (words or word fragments), which is about 50 pages of text. This is significantly more than in GPT-3, which could handle about 4,000 tokens (or about 3,000 words). According to Kogan, while lexemes are very important for developing cues, the vector database also plays an important role in ensuring the preservation of the LLM.

"Now you can fit 50 pages of context, which is pretty useful. But that's still a small fraction of all the context in the company," says Kogan. "You may not even want to fill the entire context window, because then you'll pay for the latency and cost.

"So companies need long-term memory, something to add to the model," he continues. "A model is something that knows language - it can interpret it. But it needs to be connected to a long-term memory that will store information about your company. That's what a vector database is."

According to Cogan, about half of Pinecone's customers today are working with LLM. By populating their vector database with embeds representing the entire knowledge base - whether it's retail inventory or corporate data - Pinecone customers have a long-term repository for their proprietary information.

Because Pinecone acts as a long-term memory, the data stream works a little differently. Instead of directly sending the client's question to ChatGPT (or another LLM), it is first sent to a vector database, from which the 10 or 15 most relevant documents for that query are retrieved, according to Kogan. The vector database then combines those documents with the user's original question, sends the entire package as a query to the LLM, which returns a response.

According to Cogan, the results of this approach are superior to the results of simply asking ChatGPT questions blindly, and it also helps with the vexing problem of hallucinations in LLMs. "We know this kind of workflow works very well, and we're trying to teach it to others," he says.

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mailrequest@nosota.com / Give us a call over skypenosota.skype