Skip to content

Ollama

Local LLM inference server for running AI models like Llama, Mistral, Gemma, and Qwen with GPU acceleration.

Overview

Attribute Value
Image ghcr.io/atrawog/bazzite-ai-pod-ollama:stable
Size ~11GB
GPU NVIDIA, AMD, Intel (auto-detected)
Port 11434 (default)

Quick Start

Step Command Description Recording
1 ujust ollama config Configure server
2 ujust ollama start Start server
3 ujust ollama status Check status

Using Models

Command Description Recording
ujust ollama models List installed models
ujust ollama pull -m llama3.2 Download a model
ujust ollama run -m llama3.2 Run inference

Lifecycle Commands

Command Description Recording
ujust ollama config Configure settings
ujust ollama start Start server
ujust ollama status Check status
ujust ollama logs View logs
ujust ollama shell Open shell
ujust ollama restart Restart server
ujust ollama stop Stop server
ujust ollama delete Remove config

Multiple Instances

Run multiple Ollama servers with different models:

# First instance (default, port 11434)
ujust ollama config
ujust ollama start

# Second instance (port 11435)
ujust ollama config -n 2 --port=11435
ujust ollama start -n 2

# Third instance (port 11436)
ujust ollama config -n 3 --port=11436
ujust ollama start -n 3

Manage specific instances:

ujust ollama status -n 2
ujust ollama logs -n 2
ujust ollama stop -n 2

API Access

Once running, access the Ollama API:

# List models
curl http://localhost:11434/api/tags

# Generate text
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello, how are you?"
}'

OpenWebUI Integration

Use Ollama with a chat interface:

Command Description Recording
ujust openwebui config Configure UI
ujust openwebui start Start UI

OpenWebUI automatically connects to the Ollama server via the bazzite-ai network.

See Also