Llama-index : Build your RAG with Mistral AI

Llama-index : Build your RAG with Mistral AI

2024-01-24 1 Par seuf

Since last year and the release of ChatGPT, everyone talk about LLM (Large Langage Model) and how we can leverage it to help us in our daily work.

One of the coolest thing we can do with LLMs is building a RAG (Retriever Augmented Generation) to use your internal documentation / code as a context for llm to answer questions.

Also Mistral AI, a french startup launched an open source model with very good performances compared to proprietary models like ChatGPT or Google text-bison. They have launched la plateforme an api endpoint compatible with openai API so you can use it with openai tools. It is a pay as you go mode (no monthly pricing) and it depends on the model you choose.

RAG

The RAG consist of 2 parts. The embedding part, and the prompt part.

The embedding part do the following tasks :

  • Split your documents into chunks
  • for each chunk compute the corresponding embedding (a big vector of numbers)
  • store each embedding in a Vector Database

The prompt part do

  • Ask for a question to the user
  • calculate the embedding of the question
  • fetch the nearest neighbors of your question embedding from the vector database
  • Give the found chunks as a context for the question

Llama-index

Llama-index is a python framework to help connecting your custom data to large langage models.

It provides tools to help you to ingest data, index data and query your data.

Llama-index embedding

Here is an example of python script to index all your documents contained in a directory using

  • Mistral AI embedding API
  • Qdrant as a vector DB
  • SimpleDirectoryReader as llamaindex document parser

The power of llama index is that you can found open source Reader directly on llamahub.ai. For example you can have loader to fetch documents from a sql database or a wordpress blog.

Requirements :

Before using this script, you need to install the following python dependencies

  • llama-index
  • qdrant-client
  • mistralai

And also run a qdrant vector db instance locally with docker

docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
import logging
import sys
import os
import qdrant_client
from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader, set_global_service_context
from llama_index.embeddings import MistralAIEmbedding
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.llms import MistralAI

MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")
DIRECTORY_PATH = "/path/to/index"
QDRANT_URL = "http://localhost:6333"
QDRANT_COLLECTION = "llama-index-mistral"

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

logging.info("Initilizing llm")
llm = MistralAI(api_key=MISTRAL_API_KEY, model="mistral-small")
embed_model = MistralAIEmbedding(model_name="mistral-embed", api_key=MISTRAL_API_KEY)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
set_global_service_context(service_context)

logging.info("Initializing vector store...")
client = qdrant_client.QdrantClient(
    url=QDRANT_URL,
)
vector_store = QdrantVectorStore(client=client, collection_name=QDRANT_COLLECTION)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

logging.info("Loading documents...")
documents = SimpleDirectoryReader(
    input_dir=DIRECTORY_PATH,
    recursive=True,
).load_data(show_progress=True)
logging.info(f"documents : {len(documents)}")

logging.info("Indexing...")
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context, show_progress=True
)

Now you can execute the python script and browse you qdrant database. Go to http://localhost:6333/dashboard and you will see your new collection .

llama-index prompt

Now we have indexed all our documents in qdrant, we can use this prompt script example to query it

import os
import sys
import logging
import qdrant_client
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
    get_response_synthesizer,
)
from llama_index.embeddings import MistralAIEmbedding
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.postprocessor import SimilarityPostprocessor
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.llms import MistralAI

QDRANT_URL = "http://localhost:6333"
QDRANT_COLLECTION = "llama-index-mistral"
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")

# llm
llm = MistralAI(api_key=MISTRAL_API_KEY, model="mistral-small")
embed_model = MistralAIEmbedding(model_name="mistral-embed", api_key=MISTRAL_API_KEY)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

# vector db
client = qdrant_client.QdrantClient(url=QDRANT_URL)
vector_store = QdrantVectorStore(client=client, collection_name=QDRANT_COLLECTION)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

# configure retriever
retriever = VectorIndexRetriever(index=index, similarity_top_k=10)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(service_context=service_context, streaming=True, verbose=True)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)

# query
print("Welcome to malt-deltalake AI assistant.")
print("type 'exit' to quit this.")
while True:
    print("")
    print("")
    print("Enter your question :")
    query = input()
    if query == "exit":
        break
    query_engine.query(query).print_response_stream()

And finally you can run the prompter script to ask questions about your document.

Demo:

For example I have put this article in the paths to index and if I ask a question about this article I have pretty nice responses:

Question : What is a RAG ?

A RAG, in the context provided, refers to a Retriever Augmented Generation model. It is a system that combines a retriever, responsible for searching through a database, and a generator, which produces answers to questions based on the context provided by the retriever. This concept is often used in conjunction with Large Language Models (LLMs) to allow them to utilize internal documentation or code as context for answering questions.

Question : What are the python dependencies required to use llama index with Mistral ?

The Python dependencies required to use Llama Index with Mistral are ‘llama-index’, ‘qdrant-client’, and ‘mistral’.

How can i run qdrant locally ?

To run Qdrant locally, you can use Docker. The command to run Qdrant locally using Docker is provided in the context information:

docker run -d --name qdrant -p 6333:6333 qdrant/qdrant

This command will download the Qdrant Docker image, create a new container named « qdrant », and start the Qdrant vector database instance on port 6333.

And voila 🎉