Ollama Python Library¶

This notebook demonstrates the official ollama Python library.

Features Covered¶

List models
Show model details
List running models
Generate response
Chat completion
Streaming responses
Generate embeddings
Copy and delete models

Prerequisites¶

Ollama pod running: ujust ollama start
Model pulled: ujust ollama pull llama3.2

1. Setup & Configuration¶

In [17]:

  Copied!     
 
import os
import time
import ollama

# === Configuration ===
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
DEFAULT_MODEL = "llama3.2:latest"

print(f"Ollama host: {OLLAMA_HOST}")
print(f"Default model: {DEFAULT_MODEL}")
import os import time import ollama # === Configuration === OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434") DEFAULT_MODEL = "llama3.2:latest" print(f"Ollama host: {OLLAMA_HOST}") print(f"Default model: {DEFAULT_MODEL}")

Out[17]:

Ollama host: http://ollama:11434
Default model: llama3.2:latest

2. Connection Health Check¶

In [18]:

  Copied!     
 
def check_ollama_health() -> tuple[bool, bool]:
    """Check if Ollama server is running and model is available.
    
    Returns:
        tuple: (server_healthy, model_available)
    """
    try:
        models = ollama.list()
        print("✓ Ollama server is running!")
        model_names = [m.get("model", "") for m in models.get("models", [])]
        
        if DEFAULT_MODEL in model_names:
            print(f"✓ Model '{DEFAULT_MODEL}' is available")
            return True, True
        else:
            print(f"✗ Model '{DEFAULT_MODEL}' not found!")
            print()
            if model_names:
                print("Available models:")
                for name in model_names:
                    print(f"  - {name}")
            else:
                print("No models installed.")
            print()
            print("To fix this, run:")
            print(f"  ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
            return True, False
    except Exception as e:
        print(f"✗ Cannot connect to Ollama server!")
        print(f"Error: {e}")
        print("To fix this, run: ujust ollama start")
        return False, False

ollama_healthy, model_available = check_ollama_health()
def check_ollama_health() -> tuple[bool, bool]: """Check if Ollama server is running and model is available. Returns: tuple: (server_healthy, model_available) """ try: models = ollama.list() print("✓ Ollama server is running!") model_names = [m.get("model", "") for m in models.get("models", [])] if DEFAULT_MODEL in model_names: print(f"✓ Model '{DEFAULT_MODEL}' is available") return True, True else: print(f"✗ Model '{DEFAULT_MODEL}' not found!") print() if model_names: print("Available models:") for name in model_names: print(f" - {name}") else: print("No models installed.") print() print("To fix this, run:") print(f" ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") return True, False except Exception as e: print(f"✗ Cannot connect to Ollama server!") print(f"Error: {e}") print("To fix this, run: ujust ollama start") return False, False ollama_healthy, model_available = check_ollama_health()

Out[18]:

✓ Ollama server is running!
✓ Model 'llama3.2:latest' is available

3. List Models¶

Function: ollama.list()

In [19]:

  Copied!     
 
print("=== List Available Models ===")

models = ollama.list()

if models.get("models"):
    for model in models["models"]:
        size_gb = model.get("size", 0) / (1024**3)
        print(f"  - {model['model']} ({size_gb:.2f} GB)")
else:
    print("  No models found.")
print("=== List Available Models ===") models = ollama.list() if models.get("models"): for model in models["models"]: size_gb = model.get("size", 0) / (1024**3) print(f" - {model['model']} ({size_gb:.2f} GB)") else: print(" No models found.")

Out[19]:

=== List Available Models ===
  - hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M (4.07 GB)
  - llama3.2:latest (1.88 GB)

4. Show Model Details¶

Function: ollama.show()

In [20]:

  Copied!     
 
print("=== Show Model Details ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    try:
        model_info = ollama.show(DEFAULT_MODEL)

        print(f"Model: {DEFAULT_MODEL}")
        print(f"\nDetails:")
        if "details" in model_info:
            details = model_info["details"]
            print(f"  Family: {details.get('family', 'N/A')}")
            print(f"  Parameter Size: {details.get('parameter_size', 'N/A')}")
            print(f"  Quantization: {details.get('quantization_level', 'N/A')}")

        print(f"\nModel file preview:")
        modelfile = model_info.get("modelfile", "N/A")
        print(f"  {modelfile[:300]}..." if len(modelfile) > 300 else f"  {modelfile}")
    except Exception as e:
        print(f"✗ Error: {e}")
print("=== Show Model Details ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: try: model_info = ollama.show(DEFAULT_MODEL) print(f"Model: {DEFAULT_MODEL}") print(f"\nDetails:") if "details" in model_info: details = model_info["details"] print(f" Family: {details.get('family', 'N/A')}") print(f" Parameter Size: {details.get('parameter_size', 'N/A')}") print(f" Quantization: {details.get('quantization_level', 'N/A')}") print(f"\nModel file preview:") modelfile = model_info.get("modelfile", "N/A") print(f" {modelfile[:300]}..." if len(modelfile) > 300 else f" {modelfile}") except Exception as e: print(f"✗ Error: {e}")

Out[20]:

=== Show Model Details ===
Model: llama3.2:latest

Details:
  Family: llama
  Parameter Size: 3.2B
  Quantization: Q4_K_M

Model file preview:
  # Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM llama3.2:latest

FROM /home/jovian/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff
TEMPLATE """<|start_header_id|>system<|end_header_id|>

Cutting K...

5. List Running Models¶

Function: ollama.ps()

In [21]:

  Copied!     
 
print("=== List Running Models ===")

running = ollama.ps()

if running.get("models"):
    for model in running["models"]:
        name = model.get("name", "Unknown")
        size = model.get("size", 0) / (1024**3)
        vram = model.get("size_vram", 0) / (1024**3)
        print(f"  - {name}")
        print(f"    Size: {size:.2f} GB | VRAM: {vram:.2f} GB")
else:
    print("  No models currently loaded in memory")
print("=== List Running Models ===") running = ollama.ps() if running.get("models"): for model in running["models"]: name = model.get("name", "Unknown") size = model.get("size", 0) / (1024**3) vram = model.get("size_vram", 0) / (1024**3) print(f" - {name}") print(f" Size: {size:.2f} GB | VRAM: {vram:.2f} GB") else: print(" No models currently loaded in memory")

Out[21]:

=== List Running Models ===
  - llama3.2:latest
    Size: 2.56 GB | VRAM: 2.56 GB

6. Generate Response¶

Function: ollama.generate()

In [22]:

  Copied!     
 
print("=== Generate Response ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    try:
        prompt = "Why is the sky blue? Answer in one sentence."
        print(f"Prompt: {prompt}")
        print()

        start_time = time.perf_counter()
        result = ollama.generate(
            model=DEFAULT_MODEL,
            prompt=prompt
        )
        end_time = time.perf_counter()

        print(f"Response: {result['response']}")
        print()
        print(f"Latency: {end_time - start_time:.2f}s")
        print(f"Eval tokens: {result.get('eval_count', 'N/A')}")
    except Exception as e:
        print(f"✗ Error: {e}")
print("=== Generate Response ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: try: prompt = "Why is the sky blue? Answer in one sentence." print(f"Prompt: {prompt}") print() start_time = time.perf_counter() result = ollama.generate( model=DEFAULT_MODEL, prompt=prompt ) end_time = time.perf_counter() print(f"Response: {result['response']}") print() print(f"Latency: {end_time - start_time:.2f}s") print(f"Eval tokens: {result.get('eval_count', 'N/A')}") except Exception as e: print(f"✗ Error: {e}")

Out[22]:

=== Generate Response ===
Prompt: Why is the sky blue? Answer in one sentence.

Out[22]:

Response: The sky appears blue because of a phenomenon called Rayleigh scattering, where shorter wavelengths of light (such as blue and violet) are scattered more than longer wavelengths by the tiny molecules of gases in the Earth's atmosphere.

Latency: 0.27s
Eval tokens: 44

7. Chat Completion¶

Function: ollama.chat()

In [23]:

  Copied!     
 
print("=== Chat Completion ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    try:
        response = ollama.chat(
            model=DEFAULT_MODEL,
            messages=[
                {"role": "system", "content": "You are a helpful assistant. Keep responses brief."},
                {"role": "user", "content": "What is Python?"}
            ]
        )

        print(f"Assistant: {response['message']['content']}")
    except Exception as e:
        print(f"✗ Error: {e}")
print("=== Chat Completion ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: try: response = ollama.chat( model=DEFAULT_MODEL, messages=[ {"role": "system", "content": "You are a helpful assistant. Keep responses brief."}, {"role": "user", "content": "What is Python?"} ] ) print(f"Assistant: {response['message']['content']}") except Exception as e: print(f"✗ Error: {e}")

Out[23]:

=== Chat Completion ===

Out[23]:

Assistant: Python is a high-level, interpreted programming language that's widely used for web development, data analysis, machine learning, and more. It's known for its simplicity, readability, and versatility.

8. Multi-turn Conversation¶

In [24]:

  Copied!     
 
print("=== Multi-turn Conversation ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    try:
        # Turn 1
        messages = [
            {"role": "user", "content": "What is 2 + 2?"},
        ]

        response = ollama.chat(
            model=DEFAULT_MODEL,
            messages=messages
        )
        print(f"User: What is 2 + 2?")
        print(f"Assistant: {response['message']['content']}")

        # Continue conversation
        messages.append(response["message"])
        messages.append({"role": "user", "content": "And what is that multiplied by 3?"})

        response = ollama.chat(
            model=DEFAULT_MODEL,
            messages=messages
        )
        print(f"User: And what is that multiplied by 3?")
        print(f"Assistant: {response['message']['content']}")
    except Exception as e:
        print(f"✗ Error: {e}")
print("=== Multi-turn Conversation ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: try: # Turn 1 messages = [ {"role": "user", "content": "What is 2 + 2?"}, ] response = ollama.chat( model=DEFAULT_MODEL, messages=messages ) print(f"User: What is 2 + 2?") print(f"Assistant: {response['message']['content']}") # Continue conversation messages.append(response["message"]) messages.append({"role": "user", "content": "And what is that multiplied by 3?"}) response = ollama.chat( model=DEFAULT_MODEL, messages=messages ) print(f"User: And what is that multiplied by 3?") print(f"Assistant: {response['message']['content']}") except Exception as e: print(f"✗ Error: {e}")

Out[24]:

=== Multi-turn Conversation ===

Out[24]:

User: What is 2 + 2?
Assistant: 2 + 2 = 4.
User: And what is that multiplied by 3?
Assistant: 4 × 3 = 12.

9. Streaming Response¶

Function: ollama.generate(stream=True)

In [25]:

  Copied!     
 
print("=== Streaming Response ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    try:
        print()

        stream = ollama.generate(
            model=DEFAULT_MODEL,
            prompt="Count from 1 to 5.",
            stream=True
        )

        collected = []
        for chunk in stream:
            collected.append(chunk["response"])

        print(f"Response: {''.join(collected)}")
    except Exception as e:
        print(f"✗ Error: {e}")
print("=== Streaming Response ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: try: print() stream = ollama.generate( model=DEFAULT_MODEL, prompt="Count from 1 to 5.", stream=True ) collected = [] for chunk in stream: collected.append(chunk["response"]) print(f"Response: {''.join(collected)}") except Exception as e: print(f"✗ Error: {e}")

Out[25]:

=== Streaming Response ===

Response: Here it goes:

1, 2, 3, 4, 5!

10. Generate Embeddings¶

Function: ollama.embed()

In [26]:

  Copied!     
 
print("=== Generate Embeddings ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    try:
        test_text = "Ollama makes running LLMs locally easy and efficient."

        result = ollama.embed(
            model=DEFAULT_MODEL,
            input=test_text
        )

        embeddings = result.get("embeddings", [[]])[0]
        print(f"Input: '{test_text}'")
        print(f"Embedding dimensions: {len(embeddings)}")
        print(f"First 5 values: {embeddings[:5]}")
        print(f"Last 5 values: {embeddings[-5:]}")
    except Exception as e:
        print(f"✗ Error: {e}")
print("=== Generate Embeddings ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: try: test_text = "Ollama makes running LLMs locally easy and efficient." result = ollama.embed( model=DEFAULT_MODEL, input=test_text ) embeddings = result.get("embeddings", [[]])[0] print(f"Input: '{test_text}'") print(f"Embedding dimensions: {len(embeddings)}") print(f"First 5 values: {embeddings[:5]}") print(f"Last 5 values: {embeddings[-5:]}") except Exception as e: print(f"✗ Error: {e}")

Out[26]:

=== Generate Embeddings ===
Input: 'Ollama makes running LLMs locally easy and efficient.'
Embedding dimensions: 3072
First 5 values: [-0.026683128, -0.0028091324, -0.027384995, -0.009667068, -0.017405545]
Last 5 values: [-0.028065814, 0.010568945, -0.028453464, 0.014874469, -0.029712567]

11. Copy and Delete Model¶

Functions: ollama.copy(), ollama.delete()

Warning: Delete is permanent! We safely demonstrate by copying first.

In [27]:

  Copied!     
 
print("=== Copy and Delete Model ===")

if not model_available:
    print()
    print("⚠ Skipping - model not available")
    print(f"  Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}")
else:
    COPY_NAME = f"{DEFAULT_MODEL.split(':')[0]}-test-copy:latest"

    # Step 1: Copy the model
    print(f"\n1. Copying '{DEFAULT_MODEL}' to '{COPY_NAME}'...")
    try:
        ollama.copy(source=DEFAULT_MODEL, destination=COPY_NAME)
        print(f"   Copy successful!")
    except Exception as e:
        print(f"   Copy failed: {e}")

    # Step 2: Verify the copy exists
    print(f"\n2. Verifying '{COPY_NAME}' exists...")
    models = ollama.list()
    model_names = [m["model"] for m in models.get("models", [])]
    if COPY_NAME in model_names:
        print(f"   Found '{COPY_NAME}' in model list")
    else:
        print(f"   '{COPY_NAME}' not found")

    # Step 3: Delete the copy
    print(f"\n3. Deleting '{COPY_NAME}'...")
    try:
        ollama.delete(COPY_NAME)
        print(f"   Delete successful!")
    except Exception as e:
        print(f"   Delete failed: {e}")

    # Step 4: Verify deletion
    print(f"\n4. Verifying '{COPY_NAME}' is deleted...")
    models = ollama.list()
    model_names = [m["model"] for m in models.get("models", [])]
    if COPY_NAME not in model_names:
        print(f"   '{COPY_NAME}' successfully removed")
    else:
        print(f"   '{COPY_NAME}' still exists")
print("=== Copy and Delete Model ===") if not model_available: print() print("⚠ Skipping - model not available") print(f" Run: ujust ollama pull {DEFAULT_MODEL.split(':')[0]}") else: COPY_NAME = f"{DEFAULT_MODEL.split(':')[0]}-test-copy:latest" # Step 1: Copy the model print(f"\n1. Copying '{DEFAULT_MODEL}' to '{COPY_NAME}'...") try: ollama.copy(source=DEFAULT_MODEL, destination=COPY_NAME) print(f" Copy successful!") except Exception as e: print(f" Copy failed: {e}") # Step 2: Verify the copy exists print(f"\n2. Verifying '{COPY_NAME}' exists...") models = ollama.list() model_names = [m["model"] for m in models.get("models", [])] if COPY_NAME in model_names: print(f" Found '{COPY_NAME}' in model list") else: print(f" '{COPY_NAME}' not found") # Step 3: Delete the copy print(f"\n3. Deleting '{COPY_NAME}'...") try: ollama.delete(COPY_NAME) print(f" Delete successful!") except Exception as e: print(f" Delete failed: {e}") # Step 4: Verify deletion print(f"\n4. Verifying '{COPY_NAME}' is deleted...") models = ollama.list() model_names = [m["model"] for m in models.get("models", [])] if COPY_NAME not in model_names: print(f" '{COPY_NAME}' successfully removed") else: print(f" '{COPY_NAME}' still exists")

Out[27]:

=== Copy and Delete Model ===

1. Copying 'llama3.2:latest' to 'llama3.2-test-copy:latest'...
   Copy successful!

2. Verifying 'llama3.2-test-copy:latest' exists...
   Found 'llama3.2-test-copy:latest' in model list

3. Deleting 'llama3.2-test-copy:latest'...
   Delete successful!

4. Verifying 'llama3.2-test-copy:latest' is deleted...
   'llama3.2-test-copy:latest' successfully removed

12. Error Handling¶

In [28]:

  Copied!     
 
print("=== Error Handling ===")

# Test: Non-existent model
print("\n1. Testing non-existent model...")
try:
    result = ollama.generate(
        model="nonexistent-model-xyz",
        prompt="Hello"
    )
    print(f"   Unexpected success: {result}")
except Exception as e:
    print(f"   Expected error: {type(e).__name__}: {e}")

# Test: Empty prompt
print("\n2. Testing empty prompt...")
try:
    result = ollama.generate(
        model=DEFAULT_MODEL,
        prompt=""
    )
    print(f"   Empty prompts allowed")
except Exception as e:
    print(f"   Error: {type(e).__name__}: {e}")

print("\nError handling tests completed!")
print("=== Error Handling ===") # Test: Non-existent model print("\n1. Testing non-existent model...") try: result = ollama.generate( model="nonexistent-model-xyz", prompt="Hello" ) print(f" Unexpected success: {result}") except Exception as e: print(f" Expected error: {type(e).__name__}: {e}") # Test: Empty prompt print("\n2. Testing empty prompt...") try: result = ollama.generate( model=DEFAULT_MODEL, prompt="" ) print(f" Empty prompts allowed") except Exception as e: print(f" Error: {type(e).__name__}: {e}") print("\nError handling tests completed!")

Out[28]:

=== Error Handling ===

1. Testing non-existent model...
   Expected error: ResponseError: model 'nonexistent-model-xyz' not found (status code: 404)

2. Testing empty prompt...

Out[28]:

   Empty prompts allowed

Error handling tests completed!

Summary¶

This notebook demonstrated the official ollama Python library.

Functions Used¶

Function	Purpose
`ollama.list()`	List available models
`ollama.show()`	Show model details
`ollama.ps()`	List running models
`ollama.generate()`	Generate text
`ollama.chat()`	Chat completion
`ollama.embed()`	Generate embeddings
`ollama.copy()`	Copy a model
`ollama.delete()`	Delete a model

Quick Reference¶

import ollama

# Generate
result = ollama.generate(model="llama3.2:latest", prompt="...")

# Chat
response = ollama.chat(
    model="llama3.2:latest",
    messages=[{"role": "user", "content": "Hello!"}]
)

Why Use the Ollama Library?¶

Clean API - Pythonic interface
Full features - Access to all Ollama endpoints
Type hints - IDE support and autocompletion