RAG with LangChain and ChromaDB using Ollama¶
This notebook demonstrates building a RAG (Retrieval Augmented Generation) application using:
- Ollama as the LLM backend (OpenAI-compatible API)
- LangChain for RAG pipeline orchestration
- ChromaDB as the vector database
- llama3.2 for both embeddings and chat completions
How It Works¶
- Chunk the source document into smaller pieces
- Embed each chunk using the LLM's embedding capabilities
- Store embeddings in ChromaDB vector database
- Query: Use LangChain's retrieval chain to find relevant context
- Generate responses using the RAG chain with conversation history
The notebook will automatically pull required models if needed.
Bazzite-AI Setup Required
RunD0_00_Bazzite_AI_Setup.ipynbfirst to configure Ollama, pull models, and verify GPU access.
1. Setup & Configuration¶
import os
import requests
from textwrap import wrap
# === Configuration ===
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
# === Model Configuration ===
OLLAMA_LLM_MODEL = "llama3.2:latest"
print(f"Ollama host: {OLLAMA_HOST}")
print(f"Model: {OLLAMA_LLM_MODEL}")
Ollama host: http://ollama:11434 Model: llama3.2:latest
2. Verify Models¶
Models should already be pulled by D0_00. If you see errors below, run D0_00_Bazzite_AI_Setup.ipynb first.
3. Load and Chunk Document¶
We embed a sample excerpt about COVID-19 variants directly in the notebook for a self-contained demo.
# Sample document: COVID-19 Omicron variant information
SAMPLE_TEXT = """
The Omicron variant of SARS-CoV-2, first identified in South Africa in November 2021,
rapidly spread across the globe and became the dominant variant in many countries by early 2022.
This variant exhibited significant mutations in the spike protein, raising concerns about
vaccine efficacy and therapeutic interventions.
In France, the emergence of Omicron led to a rapid replacement of the Delta variant during
the winter of 2021-2022. Epidemiological surveillance showed that Omicron cases doubled
approximately every two to three days during its initial spread, significantly faster than
previous variants.
The Omicron variant is characterized by approximately 30 mutations in the spike protein alone,
including mutations at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S,
Q498R, N501Y, and Y505H. Many of these mutations are located in the receptor-binding domain
(RBD), which is crucial for viral entry into host cells.
Studies in France demonstrated that while Omicron showed increased transmissibility compared
to Delta, it was associated with reduced severity of disease. Hospitalization rates and
intensive care unit admissions were lower per infection compared to the Delta wave, though
the sheer number of cases still strained healthcare systems.
The immune evasion properties of Omicron were substantial. Research showed reduced neutralization
by antibodies elicited by previous infection with earlier variants or by primary vaccination
series. However, booster doses significantly improved protection against severe disease.
Mathematical modeling of the Omicron invasion in France utilized multi-variant epidemiological
models to understand the dynamics of variant replacement. These models incorporated factors
such as cross-immunity between variants, vaccine coverage, and waning immunity over time.
The basic reproduction number (R0) of Omicron was estimated to be significantly higher than
Delta, with estimates ranging from 8 to 15 depending on the population and setting. This
high transmissibility was a key factor in its rapid global spread.
French public health authorities responded to the Omicron wave with enhanced testing capacity,
acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring
up-to-date vaccination status for access to certain venues and activities.
Subsequent sub-lineages of Omicron, including BA.2, BA.4, BA.5, and later BQ and XBB variants,
continued to evolve with additional mutations conferring further immune evasion properties.
This ongoing evolution necessitated updates to vaccine formulations and continued surveillance.
The experience with Omicron in France and globally highlighted the importance of genomic
surveillance, rapid response capabilities, and adaptable public health strategies in managing
emerging variants of concern during a pandemic.
"""
wrapped_text = wrap(SAMPLE_TEXT.strip(), 1000)
print(f"Document chunked into {len(wrapped_text)} pieces")
Document chunked into 3 pieces
print(f"Number of chunks: {len(wrapped_text)}")
print(f"First chunk preview: {wrapped_text[0][:100]}...")
Number of chunks: 3 First chunk preview: The Omicron variant of SARS-CoV-2, first identified in South Africa in November 2021, rapidly sprea...
4. Initialize LangChain with Ollama¶
LangChain provides a clean interface to work with LLMs. Since Ollama exposes an OpenAI-compatible API, we can use the langchain_openai classes directly.
We configure both the LLM (for chat) and embeddings to use Ollama's OpenAI-compatible endpoint.
from langchain_openai import ChatOpenAI
from langchain_community.embeddings import OllamaEmbeddings
# LLM - Ollama via OpenAI-compatible API
llm = ChatOpenAI(
base_url=f"{OLLAMA_HOST}/v1",
api_key="ollama",
model=OLLAMA_LLM_MODEL,
temperature=0.7
)
# Embeddings - Use Ollama embeddings from langchain_community
embeddings = OllamaEmbeddings(
base_url=OLLAMA_HOST,
model=OLLAMA_LLM_MODEL
)
print("✓ LangChain configured with Ollama")
✓ LangChain configured with Ollama
/tmp/ipykernel_1539/1102485877.py:13: LangChainDeprecationWarning: The class `OllamaEmbeddings` was deprecated in LangChain 0.3.1 and will be removed in 1.0.0. An updated version of the class exists in the `langchain-ollama package and should be used instead. To use it run `pip install -U `langchain-ollama` and import as `from `langchain_ollama import OllamaEmbeddings``. embeddings = OllamaEmbeddings(
Let's test the LLM connection:
print(llm.invoke("Hello! What model are you?").content)
Hello! I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."
Test the embeddings:
test_embedding = embeddings.embed_query("test query")
print(f"Embedding dimensions: {len(test_embedding)}")
Embedding dimensions: 3072
5. Create Vector Database¶
We use ChromaDB as our vector store. The embeddings are created automatically when we add documents.
from langchain_community.vectorstores import Chroma
# Create in-memory vector store
vectordb = Chroma.from_texts(
texts=wrapped_text,
embedding=embeddings
)
retriever = vectordb.as_retriever(search_kwargs={"k": 5})
print(f"✓ ChromaDB initialized with {len(wrapped_text)} documents")
✓ ChromaDB initialized with 3 documents
6. Create Prompt Template¶
With the OpenAI-compatible API, we don't need to manually add model-specific tokens. LangChain handles the chat template through the messages format.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful AI assistant. Answer questions based on the provided context.
If the answer is not in the context, say so clearly. Be concise but thorough.
Context:
{context}"""),
MessagesPlaceholder("chat_history"),
("human", "{input}")
])
[No output generated]
7. Build RAG Chain¶
We use LangChain's retrieval chain to combine document retrieval with the LLM.
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
print("✓ RAG chain created")
✓ RAG chain created
8. Chat Function¶
The chat function handles conversation history using LangChain's message objects.
from langchain_core.messages import HumanMessage, AIMessage
chat_history = []
def chat(question):
"""Query the RAG chain with conversation history."""
result = rag_chain.invoke({
"input": question,
"chat_history": chat_history
})
# Update history with proper message objects
chat_history.extend([
HumanMessage(content=question),
AIMessage(content=result['answer'])
])
print(result['answer'])
return result
[No output generated]
9. Try It Out!¶
Now let's chat with our RAG-enabled assistant.
# First question
chat("What do you know about the Omicron variant in France?")
Based on the provided context, here's what I know about the Omicron variant in France: 1. **Rapid spread**: The Omicron variant rapidly spread across France and became the dominant variant in many countries by early 2022. 2. **Significant mutations**: The variant exhibited approximately 30 mutations in the spike protein, including those at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H. 3. **Increased transmissibility**: Omicron showed increased transmissibility compared to the Delta variant, with case doubling approximately every two to three days during its initial spread. 4. **Reduced severity of disease**: Despite increased transmissibility, hospitalization rates and intensive care unit admissions were lower per infection compared to the Delta wave. 5. **Immune evasion properties**: The Omicron variant had significant immune evasion properties, with reduced neutralization by antibodies elicited by previous infection with earlier variants or primary vaccination series. 6. **Boosters improved protection**: Booster doses significantly improved protection against severe disease. 7. **Enhanced testing capacity and vaccination efforts**: French public health authorities responded to the Omicron wave with enhanced testing capacity, acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring up-to-date vaccination status for access to certain venues and activities. These findings highlight the importance of genomic surveillance, rapid response capabilities, and adaptable public health strategies in managing emerging variants of concern during a pandemic.
{'input': 'What do you know about the Omicron variant in France?',
'chat_history': [HumanMessage(content='What do you know about the Omicron variant in France?', additional_kwargs={}, response_metadata={}),
AIMessage(content="Based on the provided context, here's what I know about the Omicron variant in France:\n\n1. **Rapid spread**: The Omicron variant rapidly spread across France and became the dominant variant in many countries by early 2022.\n2. **Significant mutations**: The variant exhibited approximately 30 mutations in the spike protein, including those at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H.\n3. **Increased transmissibility**: Omicron showed increased transmissibility compared to the Delta variant, with case doubling approximately every two to three days during its initial spread.\n4. **Reduced severity of disease**: Despite increased transmissibility, hospitalization rates and intensive care unit admissions were lower per infection compared to the Delta wave.\n5. **Immune evasion properties**: The Omicron variant had significant immune evasion properties, with reduced neutralization by antibodies elicited by previous infection with earlier variants or primary vaccination series.\n6. **Boosters improved protection**: Booster doses significantly improved protection against severe disease.\n7. **Enhanced testing capacity and vaccination efforts**: French public health authorities responded to the Omicron wave with enhanced testing capacity, acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring up-to-date vaccination status for access to certain venues and activities.\n\nThese findings highlight the importance of genomic surveillance, rapid response capabilities, and adaptable public health strategies in managing emerging variants of concern during a pandemic.", additional_kwargs={}, response_metadata={})],
'context': [Document(metadata={}, page_content='depending on the population and setting. This high transmissibility was a key factor in its rapid global spread. French public health authorities responded to the Omicron wave with enhanced testing capacity, acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring up-to-date vaccination status for access to certain venues and activities. Subsequent sub-lineages of Omicron, including BA.2, BA.4, BA.5, and later BQ and XBB variants, continued to evolve with additional mutations conferring further immune evasion properties. This ongoing evolution necessitated updates to vaccine formulations and continued surveillance. The experience with Omicron in France and globally highlighted the importance of genomic surveillance, rapid response capabilities, and adaptable public health strategies in managing emerging variants of concern during a pandemic.'),
Document(metadata={}, page_content='The Omicron variant of SARS-CoV-2, first identified in South Africa in November 2021, rapidly spread across the globe and became the dominant variant in many countries by early 2022. This variant exhibited significant mutations in the spike protein, raising concerns about vaccine efficacy and therapeutic interventions. In France, the emergence of Omicron led to a rapid replacement of the Delta variant during the winter of 2021-2022. Epidemiological surveillance showed that Omicron cases doubled approximately every two to three days during its initial spread, significantly faster than previous variants. The Omicron variant is characterized by approximately 30 mutations in the spike protein alone, including mutations at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H. Many of these mutations are located in the receptor-binding domain (RBD), which is crucial for viral entry into host cells. Studies in France demonstrated that while'),
Document(metadata={}, page_content='Omicron showed increased transmissibility compared to Delta, it was associated with reduced severity of disease. Hospitalization rates and intensive care unit admissions were lower per infection compared to the Delta wave, though the sheer number of cases still strained healthcare systems. The immune evasion properties of Omicron were substantial. Research showed reduced neutralization by antibodies elicited by previous infection with earlier variants or by primary vaccination series. However, booster doses significantly improved protection against severe disease. Mathematical modeling of the Omicron invasion in France utilized multi-variant epidemiological models to understand the dynamics of variant replacement. These models incorporated factors such as cross-immunity between variants, vaccine coverage, and waning immunity over time. The basic reproduction number (R0) of Omicron was estimated to be significantly higher than Delta, with estimates ranging from 8 to 15')],
'answer': "Based on the provided context, here's what I know about the Omicron variant in France:\n\n1. **Rapid spread**: The Omicron variant rapidly spread across France and became the dominant variant in many countries by early 2022.\n2. **Significant mutations**: The variant exhibited approximately 30 mutations in the spike protein, including those at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H.\n3. **Increased transmissibility**: Omicron showed increased transmissibility compared to the Delta variant, with case doubling approximately every two to three days during its initial spread.\n4. **Reduced severity of disease**: Despite increased transmissibility, hospitalization rates and intensive care unit admissions were lower per infection compared to the Delta wave.\n5. **Immune evasion properties**: The Omicron variant had significant immune evasion properties, with reduced neutralization by antibodies elicited by previous infection with earlier variants or primary vaccination series.\n6. **Boosters improved protection**: Booster doses significantly improved protection against severe disease.\n7. **Enhanced testing capacity and vaccination efforts**: French public health authorities responded to the Omicron wave with enhanced testing capacity, acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring up-to-date vaccination status for access to certain venues and activities.\n\nThese findings highlight the importance of genomic surveillance, rapid response capabilities, and adaptable public health strategies in managing emerging variants of concern during a pandemic."} # Follow-up question (uses conversation history)
chat("What mutations does it have?")
The Omicron variant has approximately 30 mutations in the spike protein, including: 1. K417N 2. N440K 3. G446S 4. S477N 5. T478K 6. E484A 7. Q493R 8. G496S 9. Q498R 10. N501Y 11. Y505H These mutations are located in the receptor-binding domain (RBD), which is crucial for viral entry into host cells.
{'input': 'What mutations does it have?',
'chat_history': [HumanMessage(content='What do you know about the Omicron variant in France?', additional_kwargs={}, response_metadata={}),
AIMessage(content="Based on the provided context, here's what I know about the Omicron variant in France:\n\n1. **Rapid spread**: The Omicron variant rapidly spread across France and became the dominant variant in many countries by early 2022.\n2. **Significant mutations**: The variant exhibited approximately 30 mutations in the spike protein, including those at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H.\n3. **Increased transmissibility**: Omicron showed increased transmissibility compared to the Delta variant, with case doubling approximately every two to three days during its initial spread.\n4. **Reduced severity of disease**: Despite increased transmissibility, hospitalization rates and intensive care unit admissions were lower per infection compared to the Delta wave.\n5. **Immune evasion properties**: The Omicron variant had significant immune evasion properties, with reduced neutralization by antibodies elicited by previous infection with earlier variants or primary vaccination series.\n6. **Boosters improved protection**: Booster doses significantly improved protection against severe disease.\n7. **Enhanced testing capacity and vaccination efforts**: French public health authorities responded to the Omicron wave with enhanced testing capacity, acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring up-to-date vaccination status for access to certain venues and activities.\n\nThese findings highlight the importance of genomic surveillance, rapid response capabilities, and adaptable public health strategies in managing emerging variants of concern during a pandemic.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='What mutations does it have?', additional_kwargs={}, response_metadata={}),
AIMessage(content='The Omicron variant has approximately 30 mutations in the spike protein, including:\n\n1. K417N\n2. N440K\n3. G446S\n4. S477N\n5. T478K\n6. E484A\n7. Q493R\n8. G496S\n9. Q498R\n10. N501Y\n11. Y505H\n\nThese mutations are located in the receptor-binding domain (RBD), which is crucial for viral entry into host cells.', additional_kwargs={}, response_metadata={})],
'context': [Document(metadata={}, page_content='depending on the population and setting. This high transmissibility was a key factor in its rapid global spread. French public health authorities responded to the Omicron wave with enhanced testing capacity, acceleration of booster vaccination campaigns, and implementation of sanitary passes requiring up-to-date vaccination status for access to certain venues and activities. Subsequent sub-lineages of Omicron, including BA.2, BA.4, BA.5, and later BQ and XBB variants, continued to evolve with additional mutations conferring further immune evasion properties. This ongoing evolution necessitated updates to vaccine formulations and continued surveillance. The experience with Omicron in France and globally highlighted the importance of genomic surveillance, rapid response capabilities, and adaptable public health strategies in managing emerging variants of concern during a pandemic.'),
Document(metadata={}, page_content='The Omicron variant of SARS-CoV-2, first identified in South Africa in November 2021, rapidly spread across the globe and became the dominant variant in many countries by early 2022. This variant exhibited significant mutations in the spike protein, raising concerns about vaccine efficacy and therapeutic interventions. In France, the emergence of Omicron led to a rapid replacement of the Delta variant during the winter of 2021-2022. Epidemiological surveillance showed that Omicron cases doubled approximately every two to three days during its initial spread, significantly faster than previous variants. The Omicron variant is characterized by approximately 30 mutations in the spike protein alone, including mutations at positions K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H. Many of these mutations are located in the receptor-binding domain (RBD), which is crucial for viral entry into host cells. Studies in France demonstrated that while'),
Document(metadata={}, page_content='Omicron showed increased transmissibility compared to Delta, it was associated with reduced severity of disease. Hospitalization rates and intensive care unit admissions were lower per infection compared to the Delta wave, though the sheer number of cases still strained healthcare systems. The immune evasion properties of Omicron were substantial. Research showed reduced neutralization by antibodies elicited by previous infection with earlier variants or by primary vaccination series. However, booster doses significantly improved protection against severe disease. Mathematical modeling of the Omicron invasion in France utilized multi-variant epidemiological models to understand the dynamics of variant replacement. These models incorporated factors such as cross-immunity between variants, vaccine coverage, and waning immunity over time. The basic reproduction number (R0) of Omicron was estimated to be significantly higher than Delta, with estimates ranging from 8 to 15')],
'answer': 'The Omicron variant has approximately 30 mutations in the spike protein, including:\n\n1. K417N\n2. N440K\n3. G446S\n4. S477N\n5. T478K\n6. E484A\n7. Q493R\n8. G496S\n9. Q498R\n10. N501Y\n11. Y505H\n\nThese mutations are located in the receptor-binding domain (RBD), which is crucial for viral entry into host cells.'} ## 10. Utilities
def reset_conversation():
"""Reset conversation history to start fresh."""
global chat_history
chat_history = []
print("✓ Conversation history cleared")
# Uncomment to reset:
# reset_conversation()
[No output generated]
# View conversation history
print(f"Conversation has {len(chat_history)} messages")
for msg in chat_history:
role = "USER" if isinstance(msg, HumanMessage) else "ASSISTANT"
content = msg.content[:100] + "..." if len(msg.content) > 100 else msg.content
print(f"[{role}]: {content}")
Conversation has 4 messages [USER]: What do you know about the Omicron variant in France? [ASSISTANT]: Based on the provided context, here's what I know about the Omicron variant in France: 1. **Rapid s... [USER]: What mutations does it have? [ASSISTANT]: The Omicron variant has approximately 30 mutations in the spike protein, including: 1. K417N 2. N44...
# === Unload Ollama Model & Shutdown Kernel ===
# Unloads the model from GPU memory before shutting down
try:
import ollama
print(f"Unloading Ollama model: {OLLAMA_LLM_MODEL}")
ollama.generate(model=OLLAMA_LLM_MODEL, prompt="", keep_alive=0)
print("Model unloaded from GPU memory")
except Exception as e:
print(f"Model unload skipped: {e}")
# Shut down the kernel to fully release resources
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(restart=False)