Ollama Anthropic API¶
Ollama provides an Anthropic-compatible API, letting you use the official anthropic Python SDK with local models.
What You'll Learn¶
- Setup - Configure the Anthropic client to use Ollama
- Messages - Basic chat completions
- System Prompts - Customize model behavior
- Multi-turn - Maintain conversation context
- Streaming - Real-time token output
- Tool Calling - Function calling with full roundtrip
- Vision - Image analysis with base64 encoding
Limitations vs Official API¶
- No token counting endpoint
- No prompt caching
- No PDF support
- Base64 images only (no URLs)
Prerequisites¶
- Ollama pod running:
ujust ollama start - Ollama version >= 0.14.0
1. Setup¶
The Anthropic client requires two parameters:
base_url- Points to your Ollama server instead of Anthropic's APIapi_key- Required by the SDK but ignored by Ollama (use any value)
import ollama
import anthropic
MODEL = "ministral-3:3b-instruct-2512-q4_K_M"
client = anthropic.Anthropic(base_url="http://ollama:11434", api_key="ollama")
2. Pull Model¶
Ollama downloads models on first use. This is a one-time operation - subsequent calls are instant.
ollama.pull(MODEL)
ProgressResponse(status='success', completed=None, total=None, digest=None)
3. Basic Message¶
The core API call is messages.create(). The response contains the model's reply in content[0].text.
message = client.messages.create(
model=MODEL,
max_tokens=100,
messages=[{"role": "user", "content": "Why is the sky blue? One sentence."}]
)
print(message.content[0].text)
The sky appears blue due to **Rayleigh scattering**, where shorter blue wavelengths of sunlight are more scattered by Earth's atmosphere than other colors.
4. System Prompt¶
The system parameter lets you set the model's behavior and personality.
message = client.messages.create(
model=MODEL,
max_tokens=100,
system="You are a pirate. Respond in pirate speak.",
messages=[{"role": "user", "content": "What is machine learning?"}]
)
print(message.content[0].text)
Arr matey, gather ‘round and let ol’ Captain Scallywag tell ye the tale o’ machine learning, ye scallywags! Machine learning be the art o’ makin’ machines smart enough to learn from their own experiences, like a clever parrot learnin’ new words or a wise old seafaring dog learnin’ to fetch its master’s hat. Ye see, these machines don’t need to be told *exactly* what to do—just
5. Multi-turn Conversation¶
Maintain context by appending the assistant's response to the messages list before the next turn.
messages = [{"role": "user", "content": "What is 2 + 2?"}]
response = client.messages.create(model=MODEL, max_tokens=50, messages=messages)
print(f"User: What is 2 + 2?")
print(f"Assistant: {response.content[0].text}")
messages.append({"role": "assistant", "content": response.content[0].text})
messages.append({"role": "user", "content": "Multiply that by 3"})
response = client.messages.create(model=MODEL, max_tokens=50, messages=messages)
print(f"\nUser: Multiply that by 3")
print(f"Assistant: {response.content[0].text}")
User: What is 2 + 2? Assistant: 2 + 2 equals **4**. User: Multiply that by 3 Assistant: 4 × 3 = **12**.
6. Streaming¶
Use messages.stream() to receive tokens in real-time as they're generated.
from IPython.display import display, Markdown, clear_output
with client.messages.stream(
model=MODEL,
max_tokens=100,
messages=[{"role": "user", "content": "Count from 1 to 10"}]
) as stream:
response = ""
for text in stream.text_stream:
response += text
clear_output(wait=True)
display(Markdown(response))
- One
- Two
- Three
- Four
- Five
- Six
- Seven
- Eight
- Nine
- Ten
7. Tool Calling¶
Full roundtrip: define a tool → model requests it → return result → get final answer.
- Define tool with
input_schema(JSON Schema format) - Model returns
tool_useblock with the tool name and arguments - Execute your function and send back a
tool_result - Model incorporates the result into its final response
import json
tool = {
"name": "get_weather",
"description": "Get weather for a location",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string", "description": "City name"}},
"required": ["location"]
}
}
# Step 1: Model requests tool use
response = client.messages.create(
model=MODEL, max_tokens=200, tools=[tool],
tool_choice={"type": "tool", "name": "get_weather"}, # Force tool use
messages=[{"role": "user", "content": "What's the weather in Paris?"}]
)
tool_use = next(b for b in response.content if b.type == "tool_use")
print(f"Tool requested: {tool_use.name}")
print(f"Arguments: {tool_use.input}")
# Step 2: Return tool result and get final response
final = client.messages.create(
model=MODEL, max_tokens=200, tools=[tool],
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [
{"type": "tool_result", "tool_use_id": tool_use.id,
"content": json.dumps({"temp": 18, "conditions": "Partly cloudy"})}
]}
]
)
print(f"\nFinal response: {final.content[0].text}")
Tool requested: get_weather
Arguments: {'location': 'Paris'}
Final response: The current weather in **Paris** is **partly cloudy** with a temperature around **18°C (64°F)**. 8. Vision¶
Send images as base64-encoded data. Ollama's Anthropic API does not support image URLs.
import base64
from io import BytesIO
from PIL import Image, ImageDraw
# Create a simple test image
img = Image.new('RGB', (200, 150), 'white')
draw = ImageDraw.Draw(img)
draw.rectangle([10, 10, 90, 70], fill='red')
draw.rectangle([110, 10, 190, 70], fill='blue')
draw.ellipse([60, 80, 140, 140], fill='green')
# Show the image
display(img)
# Convert to base64
buffer = BytesIO()
img.save(buffer, format='PNG')
image_b64 = base64.b64encode(buffer.getvalue()).decode()
# Send to model
message = client.messages.create(
model=MODEL, max_tokens=200,
messages=[{"role": "user", "content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
{"type": "text", "text": "What shapes and colors do you see?"}
]}]
)
print(message.content[0].text)
From the image you provided, I can identify the following shapes and colors: 1. Two rectangles: - One is red. - The other is blue. 2. One oval (or ellipse): - It is green.
9. Cleanup¶
Unload the model from GPU memory using keep_alive=0.
ollama.generate(model=MODEL, prompt="", keep_alive=0)
print("Model unloaded from memory")
Model unloaded from memory