Bazzite-AI Environment Setup¶

One-time setup for the LLMs on Supercomputers Course

This notebook configures your bazzite-ai environment for running the course notebooks. Run this once at the start of each JupyterLab session.

Attribution¶

The course notebooks are adapted from the Foundations of LLM Mastery training series by:

Simeon Harrison (INiTS and AI Factory Austria AI:AT)
Thomas Haschka (Campus IT / HPC, TU Wien)
Martin Pfister (Advanced Computing Austria ACA GmbH)

Original source: gitlab.tuwien.ac.at/vsc-public/training/LLMs-on-supercomputers

Adapted for Bazzite.AI by Andreas Trawöger

License: CC BY-SA 4.0

In [ ]:

Bazzite-AI vs Supercomputer Environment¶

The original course was designed for the Vienna Scientific Cluster (VSC) supercomputer. In bazzite-ai, we run everything locally with:

Aspect	VSC Supercomputer	Bazzite-AI
GPU Access	SLURM job scheduler	Direct GPU access via container
LLM Inference	vLLM server	Ollama pod (containerized)
Model Loading	Shared NFS storage	HuggingFace Hub / Ollama pull
API Compatibility	OpenAI-compatible vLLM	OpenAI-compatible Ollama

Key Difference: Ollama as OpenAI Drop-in¶

Instead of OpenAI's paid API or a vLLM server, we use Ollama which provides an OpenAI-compatible endpoint locally:

# OpenAI (paid cloud API)
client = OpenAI(api_key="sk-...")

# Ollama (free local inference - same code works!)
client = OpenAI(base_url="http://ollama:11434/v1", api_key="ollama")

This means most code samples work unchanged - just point to Ollama instead of OpenAI.

1. GPU Access & Environment Testing¶

First, let's verify GPU access and check available memory.

In [1]:

  Copied!     
 
import torch
import gc

print("=" * 50)
print("GPU & Environment Status")
print("=" * 50)

# Check PyTorch CUDA availability
print(f"\nPyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")
    print(f"Current device: {torch.cuda.current_device()}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

# System-wide GPU memory check using pynvml
try:
    import pynvml
    pynvml.nvmlInit()
    device_count = pynvml.nvmlDeviceGetCount()
    
    print(f"\n--- System-Wide GPU Memory ---")
    
    for i in range(device_count):
        handle = pynvml.nvmlDeviceGetHandleByIndex(i)
        name = pynvml.nvmlDeviceGetName(handle)
        info = pynvml.nvmlDeviceGetMemoryInfo(handle)
        
        total_gb = info.total / 1024**3
        used_gb = info.used / 1024**3
        free_gb = info.free / 1024**3
        usage_pct = (info.used / info.total) * 100
        
        print(f"\nGPU {i}: {name}")
        print(f"  Total:  {total_gb:.2f} GB")
        print(f"  Used:   {used_gb:.2f} GB ({usage_pct:.1f}%)")
        print(f"  Free:   {free_gb:.2f} GB")
        
        # Warning thresholds
        if free_gb < 4.0:
            print(f"  \u26a0\ufe0f  CRITICAL: Very low GPU memory!")
            print(f"      Shutdown other notebook kernels before proceeding.")
        elif free_gb < 6.0:
            print(f"  \u26a0\ufe0f  WARNING: Low GPU memory.")
            print(f"      7B models need ~5GB with 4-bit quantization.")
    
    pynvml.nvmlShutdown()
    
except ImportError:
    print("\n\u26a0\ufe0f  pynvml not installed - using PyTorch memory info (per-process only)")
    if torch.cuda.is_available():
        for i in range(torch.cuda.device_count()):
            total = torch.cuda.get_device_properties(i).total_memory / 1024**3
            allocated = torch.cuda.memory_allocated(i) / 1024**3
            print(f"  GPU {i}: {allocated:.2f} / {total:.2f} GB (this process only)")

# Quick GPU test
if torch.cuda.is_available():
    print("\n--- GPU Computation Test ---")
    try:
        x = torch.randn(1000, 1000, device="cuda")
        y = torch.matmul(x, x)
        del x, y
        torch.cuda.empty_cache()
        print("\u2705 GPU computation test passed!")
    except Exception as e:
        print(f"\u274c GPU test failed: {e}")
else:
    print("\n\u26a0\ufe0f  No GPU available - running on CPU only")

print("\n" + "=" * 50)
import torch import gc print("=" * 50) print("GPU & Environment Status") print("=" * 50) # Check PyTorch CUDA availability print(f"\nPyTorch version: {torch.__version__}") print(f"CUDA available: {torch.cuda.is_available()}") if torch.cuda.is_available(): print(f"CUDA version: {torch.version.cuda}") print(f"GPU count: {torch.cuda.device_count()}") print(f"Current device: {torch.cuda.current_device()}") print(f"GPU name: {torch.cuda.get_device_name(0)}") # System-wide GPU memory check using pynvml try: import pynvml pynvml.nvmlInit() device_count = pynvml.nvmlDeviceGetCount() print(f"\n--- System-Wide GPU Memory ---") for i in range(device_count): handle = pynvml.nvmlDeviceGetHandleByIndex(i) name = pynvml.nvmlDeviceGetName(handle) info = pynvml.nvmlDeviceGetMemoryInfo(handle) total_gb = info.total / 1024**3 used_gb = info.used / 1024**3 free_gb = info.free / 1024**3 usage_pct = (info.used / info.total) * 100 print(f"\nGPU {i}: {name}") print(f" Total: {total_gb:.2f} GB") print(f" Used: {used_gb:.2f} GB ({usage_pct:.1f}%)") print(f" Free: {free_gb:.2f} GB") # Warning thresholds if free_gb < 4.0: print(f" \u26a0\ufe0f CRITICAL: Very low GPU memory!") print(f" Shutdown other notebook kernels before proceeding.") elif free_gb < 6.0: print(f" \u26a0\ufe0f WARNING: Low GPU memory.") print(f" 7B models need ~5GB with 4-bit quantization.") pynvml.nvmlShutdown() except ImportError: print("\n\u26a0\ufe0f pynvml not installed - using PyTorch memory info (per-process only)") if torch.cuda.is_available(): for i in range(torch.cuda.device_count()): total = torch.cuda.get_device_properties(i).total_memory / 1024**3 allocated = torch.cuda.memory_allocated(i) / 1024**3 print(f" GPU {i}: {allocated:.2f} / {total:.2f} GB (this process only)") # Quick GPU test if torch.cuda.is_available(): print("\n--- GPU Computation Test ---") try: x = torch.randn(1000, 1000, device="cuda") y = torch.matmul(x, x) del x, y torch.cuda.empty_cache() print("\u2705 GPU computation test passed!") except Exception as e: print(f"\u274c GPU test failed: {e}") else: print("\n\u26a0\ufe0f No GPU available - running on CPU only") print("\n" + "=" * 50)

==================================================
GPU & Environment Status
==================================================

PyTorch version: 2.9.1+cu130
CUDA available: True
CUDA version: 13.0
GPU count: 1
Current device: 0
GPU name: NVIDIA GeForce RTX 4080 SUPER

--- System-Wide GPU Memory ---

GPU 0: b'NVIDIA GeForce RTX 4080 SUPER'
  Total:  15.99 GB
  Used:   10.37 GB (64.8%)
  Free:   5.62 GB
  ⚠️  WARNING: Low GPU memory.
      7B models need ~5GB with 4-bit quantization.

--- GPU Computation Test ---
✅ GPU computation test passed!

==================================================

2. Ollama Pod Management¶

Ollama runs as a containerized pod in bazzite-ai. Let's check if it's running.

In [2]:

  Copied!     
 
import os
import requests

# Ollama configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")

print(f"Ollama host: {OLLAMA_HOST}")
print("\n--- Checking Ollama Connection ---")

def check_ollama_health():
    """Check if Ollama server is running and healthy."""
    try:
        response = requests.get(f"{OLLAMA_HOST}/api/tags", timeout=5)
        if response.status_code == 200:
            return True, response.json()
        return False, f"Unexpected status: {response.status_code}"
    except requests.exceptions.ConnectionError:
        return False, "Connection refused - Ollama pod not running"
    except requests.exceptions.Timeout:
        return False, "Connection timed out"
    except Exception as e:
        return False, str(e)

is_running, result = check_ollama_health()

if is_running:
    print("\u2705 Ollama server is running!")
    models = result.get("models", [])
    if models:
        print(f"\nAvailable models ({len(models)}):")
        for m in models:
            name = m.get("name", "Unknown")
            size_gb = m.get("size", 0) / 1024**3
            print(f"  - {name} ({size_gb:.1f} GB)")
    else:
        print("\nNo models pulled yet (we'll pull them in the next step).")
else:
    print(f"\u274c Ollama is not running: {result}")
    print("\n--- How to Start Ollama ---")
    print("Run this command in a terminal:")
    print("")
    print("    ujust ollama start")
    print("")
    print("Then re-run this cell to verify the connection.")
import os import requests # Ollama configuration OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"Ollama host: {OLLAMA_HOST}") print("\n--- Checking Ollama Connection ---") def check_ollama_health(): """Check if Ollama server is running and healthy.""" try: response = requests.get(f"{OLLAMA_HOST}/api/tags", timeout=5) if response.status_code == 200: return True, response.json() return False, f"Unexpected status: {response.status_code}" except requests.exceptions.ConnectionError: return False, "Connection refused - Ollama pod not running" except requests.exceptions.Timeout: return False, "Connection timed out" except Exception as e: return False, str(e) is_running, result = check_ollama_health() if is_running: print("\u2705 Ollama server is running!") models = result.get("models", []) if models: print(f"\nAvailable models ({len(models)}):") for m in models: name = m.get("name", "Unknown") size_gb = m.get("size", 0) / 1024**3 print(f" - {name} ({size_gb:.1f} GB)") else: print("\nNo models pulled yet (we'll pull them in the next step).") else: print(f"\u274c Ollama is not running: {result}") print("\n--- How to Start Ollama ---") print("Run this command in a terminal:") print("") print(" ujust ollama start") print("") print("Then re-run this cell to verify the connection.")

Ollama host: http://ollama:11434

--- Checking Ollama Connection ---
✅ Ollama server is running!

Available models (2):
  - hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M (4.1 GB)
  - llama3.2:latest (1.9 GB)

3. Model Management (Auto-Pull)¶

The course notebooks require specific models. Let's check if they're available and pull any missing ones.

In [3]:

  Copied!     
 
import json

# Required models for the course
REQUIRED_MODELS = [
    {
        "name": "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M",
        "used_by": "D1 notebooks (Prompt Engineering)",
        "size_hint": "~4.4 GB"
    },
    {
        "name": "llama3.2:latest",
        "used_by": "D2 notebooks (RAG)",
        "size_hint": "~2.0 GB"
    }
]

def get_available_models():
    """Get list of models available in Ollama."""
    try:
        response = requests.get(f"{OLLAMA_HOST}/api/tags", timeout=5)
        if response.status_code == 200:
            return [m.get("name", "") for m in response.json().get("models", [])]
    except:
        pass
    return []

def pull_model(model_name):
    """Pull a model from Ollama, showing progress."""
    print(f"\nPulling '{model_name}'...")
    print("(This may take several minutes for large models)")
    
    try:
        response = requests.post(
            f"{OLLAMA_HOST}/api/pull",
            json={"name": model_name},
            stream=True,
            timeout=1800  # 30 minute timeout for large models
        )
        
        last_status = ""
        for line in response.iter_lines():
            if line:
                data = json.loads(line)
                status = data.get("status", "")
                
                # Show download progress
                if "pulling" in status or "downloading" in status:
                    completed = data.get("completed", 0)
                    total = data.get("total", 0)
                    if total > 0:
                        pct = (completed / total) * 100
                        print(f"\r  Progress: {pct:.1f}%", end="", flush=True)
                elif status != last_status:
                    if last_status:
                        print()  # newline after progress
                    print(f"  {status}")
                    last_status = status
                
                if status == "success":
                    print(f"\n\u2705 Model '{model_name}' pulled successfully!")
                    return True
        
        return True
    except Exception as e:
        print(f"\n\u274c Failed to pull model: {e}")
        return False

# Check connection first
is_running, _ = check_ollama_health()
if not is_running:
    print("\u274c Ollama is not running. Start it first with: ujust ollama start")
else:
    print("Checking required models...\n")
    available = get_available_models()
    
    all_ready = True
    for model_info in REQUIRED_MODELS:
        model_name = model_info["name"]
        
        # Check if model is available (exact match or prefix match)
        is_available = any(model_name in m or m in model_name for m in available)
        
        if is_available:
            print(f"\u2705 {model_name}")
            print(f"   Used by: {model_info['used_by']}")
        else:
            print(f"\u274c {model_name} - NOT FOUND")
            print(f"   Used by: {model_info['used_by']}")
            print(f"   Size: {model_info['size_hint']}")
            
            # Auto-pull missing model
            success = pull_model(model_name)
            if not success:
                all_ready = False
    
    print("\n" + "=" * 50)
    if all_ready:
        print("\u2705 All required models are available!")
    else:
        print("\u26a0\ufe0f  Some models failed to download. Try manually:")
        print("    ujust ollama pull <model-name>")
import json # Required models for the course REQUIRED_MODELS = [ { "name": "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M", "used_by": "D1 notebooks (Prompt Engineering)", "size_hint": "~4.4 GB" }, { "name": "llama3.2:latest", "used_by": "D2 notebooks (RAG)", "size_hint": "~2.0 GB" } ] def get_available_models(): """Get list of models available in Ollama.""" try: response = requests.get(f"{OLLAMA_HOST}/api/tags", timeout=5) if response.status_code == 200: return [m.get("name", "") for m in response.json().get("models", [])] except: pass return [] def pull_model(model_name): """Pull a model from Ollama, showing progress.""" print(f"\nPulling '{model_name}'...") print("(This may take several minutes for large models)") try: response = requests.post( f"{OLLAMA_HOST}/api/pull", json={"name": model_name}, stream=True, timeout=1800 # 30 minute timeout for large models ) last_status = "" for line in response.iter_lines(): if line: data = json.loads(line) status = data.get("status", "") # Show download progress if "pulling" in status or "downloading" in status: completed = data.get("completed", 0) total = data.get("total", 0) if total > 0: pct = (completed / total) * 100 print(f"\r Progress: {pct:.1f}%", end="", flush=True) elif status != last_status: if last_status: print() # newline after progress print(f" {status}") last_status = status if status == "success": print(f"\n\u2705 Model '{model_name}' pulled successfully!") return True return True except Exception as e: print(f"\n\u274c Failed to pull model: {e}") return False # Check connection first is_running, _ = check_ollama_health() if not is_running: print("\u274c Ollama is not running. Start it first with: ujust ollama start") else: print("Checking required models...\n") available = get_available_models() all_ready = True for model_info in REQUIRED_MODELS: model_name = model_info["name"] # Check if model is available (exact match or prefix match) is_available = any(model_name in m or m in model_name for m in available) if is_available: print(f"\u2705 {model_name}") print(f" Used by: {model_info['used_by']}") else: print(f"\u274c {model_name} - NOT FOUND") print(f" Used by: {model_info['used_by']}") print(f" Size: {model_info['size_hint']}") # Auto-pull missing model success = pull_model(model_name) if not success: all_ready = False print("\n" + "=" * 50) if all_ready: print("\u2705 All required models are available!") else: print("\u26a0\ufe0f Some models failed to download. Try manually:") print(" ujust ollama pull ")

Checking required models...

✅ hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M
   Used by: D1 notebooks (Prompt Engineering)
✅ llama3.2:latest
   Used by: D2 notebooks (RAG)

==================================================
✅ All required models are available!

4. Ollama as OpenAI Drop-in Replacement¶

Ollama provides an OpenAI-compatible API, which means you can use the same code for both:

Aspect	OpenAI API	Ollama (bazzite-ai)
Cost	Pay per token	Free (runs locally)
API Key	Required	Not needed
Privacy	Data sent to cloud	Data stays local
Models	OpenAI models only	Any GGUF model
base_url	`https://api.openai.com/v1`	`http://ollama:11434/v1`

Configuration Pattern¶

In the course notebooks, you'll see this minimal configuration:

import os
from openai import OpenAI

OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

client = OpenAI(
    base_url=f"{OLLAMA_HOST}/v1",
    api_key="ollama"  # Required by library but ignored by Ollama
)

The same pattern works with LangChain:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url=f"{OLLAMA_HOST}/v1",
    api_key="ollama",
    model=MODEL
)

In [4]:

  Copied!     
 
# Quick API test
from openai import OpenAI

# === Model Configuration ===
HF_LLM_MODEL = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF"
OLLAMA_LLM_MODEL = f"hf.co/{HF_LLM_MODEL}:Q4_K_M"

print("Testing OpenAI-compatible API...")
print(f"Model: {OLLAMA_LLM_MODEL}")

try:
    client = OpenAI(
        base_url=f"{OLLAMA_HOST}/v1",
        api_key="ollama"
    )
    
    response = client.chat.completions.create(
        model=OLLAMA_LLM_MODEL,
        messages=[{"role": "user", "content": "Say 'Hello from Ollama!' in exactly 5 words."}],
        max_tokens=20
    )
    
    print(f"\u2705 API test passed!")
    print(f"\nResponse: {response.choices[0].message.content}")
    
except Exception as e:
    print(f"\u274c API test failed: {e}")
    print("\nMake sure Ollama is running and the model is pulled.")
# Quick API test from openai import OpenAI # === Model Configuration === HF_LLM_MODEL = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF" OLLAMA_LLM_MODEL = f"hf.co/{HF_LLM_MODEL}:Q4_K_M" print("Testing OpenAI-compatible API...") print(f"Model: {OLLAMA_LLM_MODEL}") try: client = OpenAI( base_url=f"{OLLAMA_HOST}/v1", api_key="ollama" ) response = client.chat.completions.create( model=OLLAMA_LLM_MODEL, messages=[{"role": "user", "content": "Say 'Hello from Ollama!' in exactly 5 words."}], max_tokens=20 ) print(f"\u2705 API test passed!") print(f"\nResponse: {response.choices[0].message.content}") except Exception as e: print(f"\u274c API test failed: {e}") print("\nMake sure Ollama is running and the model is pulled.")

Testing OpenAI-compatible API...
Model: hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M
✅ API test passed!

Response: Hello, Ollama! Welcome!

5. Datasets Directory¶

Notebooks can use relative paths to access datasets. The Jupyter kernel runs in the notebook's directory.

In [5]:

  Copied!     
 
from pathlib import Path

# With kernel cwd fix, notebooks run in their own directory
# Datasets are in ./datasets/ relative to this notebook
DATASETS_DIR = Path("./datasets")

print(f"Datasets directory: {DATASETS_DIR.resolve()}")
datasets = list(DATASETS_DIR.glob('*.csv'))
if datasets:
    print(f"Available datasets: {[d.name for d in datasets]}")
from pathlib import Path # With kernel cwd fix, notebooks run in their own directory # Datasets are in ./datasets/ relative to this notebook DATASETS_DIR = Path("./datasets") print(f"Datasets directory: {DATASETS_DIR.resolve()}") datasets = list(DATASETS_DIR.glob('*.csv')) if datasets: print(f"Available datasets: {[d.name for d in datasets]}")

Datasets directory: /workspace/Sync/AI/bazzite/bazzite-ai-testing/notebooks/llms_on_supercomputers/datasets
Available datasets: ['booking_queries_dataset.csv', 'code_review_dataset.csv', 'health_and_fitness_qna.csv']

6. Environment Verification & Readiness Check¶

Final verification that everything is working.

In [6]:

  Copied!     
 
print("=" * 60)
print("ENVIRONMENT READINESS CHECK")
print("=" * 60)

checks = []

# 1. GPU Check
gpu_ok = torch.cuda.is_available()
checks.append(("GPU Access", gpu_ok, "CUDA available" if gpu_ok else "No GPU - CPU only"))

# 2. Ollama Check
ollama_ok, _ = check_ollama_health()
checks.append(("Ollama Server", ollama_ok, "Running" if ollama_ok else "Not running"))

# 3. Models Check
if ollama_ok:
    available = get_available_models()
    model_count = len(available)
    models_ok = model_count > 0
    checks.append(("Ollama Models", models_ok, f"{model_count} models available" if models_ok else "No models"))
else:
    checks.append(("Ollama Models", False, "Ollama not running"))

# 4. API Test
if ollama_ok and models_ok:
    try:
        client = OpenAI(base_url=f"{OLLAMA_HOST}/v1", api_key="ollama")
        # Quick test with small output
        response = client.chat.completions.create(
            model=available[0],
            messages=[{"role": "user", "content": "Hi"}],
            max_tokens=5
        )
        api_ok = True
    except:
        api_ok = False
    checks.append(("API Inference", api_ok, "Working" if api_ok else "Failed"))
else:
    checks.append(("API Inference", False, "Prerequisites not met"))

# Print results
print("\n")
all_ok = True
for name, ok, detail in checks:
    status = "\u2705" if ok else "\u274c"
    print(f"{status} {name}: {detail}")
    if not ok and name not in ["GPU Access"]:  # GPU is optional
        all_ok = False

print("\n" + "=" * 60)
if all_ok:
    print("\u2705 ENVIRONMENT READY!")
    print("\nYou can now proceed to D1_01_Prompting_with_LangChain.ipynb")
else:
    print("\u26a0\ufe0f  SOME ISSUES DETECTED")
    print("\nPlease resolve the issues above before continuing.")
    if not ollama_ok:
        print("\nTo start Ollama, run in a terminal:")
        print("    ujust ollama start")
print("=" * 60)
print("=" * 60) print("ENVIRONMENT READINESS CHECK") print("=" * 60) checks = [] # 1. GPU Check gpu_ok = torch.cuda.is_available() checks.append(("GPU Access", gpu_ok, "CUDA available" if gpu_ok else "No GPU - CPU only")) # 2. Ollama Check ollama_ok, _ = check_ollama_health() checks.append(("Ollama Server", ollama_ok, "Running" if ollama_ok else "Not running")) # 3. Models Check if ollama_ok: available = get_available_models() model_count = len(available) models_ok = model_count > 0 checks.append(("Ollama Models", models_ok, f"{model_count} models available" if models_ok else "No models")) else: checks.append(("Ollama Models", False, "Ollama not running")) # 4. API Test if ollama_ok and models_ok: try: client = OpenAI(base_url=f"{OLLAMA_HOST}/v1", api_key="ollama") # Quick test with small output response = client.chat.completions.create( model=available[0], messages=[{"role": "user", "content": "Hi"}], max_tokens=5 ) api_ok = True except: api_ok = False checks.append(("API Inference", api_ok, "Working" if api_ok else "Failed")) else: checks.append(("API Inference", False, "Prerequisites not met")) # Print results print("\n") all_ok = True for name, ok, detail in checks: status = "\u2705" if ok else "\u274c" print(f"{status} {name}: {detail}") if not ok and name not in ["GPU Access"]: # GPU is optional all_ok = False print("\n" + "=" * 60) if all_ok: print("\u2705 ENVIRONMENT READY!") print("\nYou can now proceed to D1_01_Prompting_with_LangChain.ipynb") else: print("\u26a0\ufe0f SOME ISSUES DETECTED") print("\nPlease resolve the issues above before continuing.") if not ollama_ok: print("\nTo start Ollama, run in a terminal:") print(" ujust ollama start") print("=" * 60)

============================================================
ENVIRONMENT READINESS CHECK
============================================================

✅ GPU Access: CUDA available
✅ Ollama Server: Running
✅ Ollama Models: 2 models available
✅ API Inference: Working

============================================================
✅ ENVIRONMENT READY!

You can now proceed to D1_01_Prompting_with_LangChain.ipynb
============================================================

Next Steps¶

Your bazzite-ai environment is configured! You can now proceed with the course:

D1 - Prompt Engineering Essentials¶

D1_01_Prompting_with_LangChain.ipynb - Start here!
D1_02_Prompt_templates_and_parsing.ipynb
D1_05_Chaining.ipynb
D1_08_LLM_Evaluation.ipynb
D1_09_LLM_as_a_Judge.ipynb
D1_10_Prompt_Optimization.ipynb

D2 - Retrieval Augmented Generation¶

D2_01_rag_with_basic_tools.ipynb
D2_02_rag_with_langchain_and_chromadb.ipynb

D3 - Fine-tuning on One GPU¶

D3_01_Transformer_Architecture.ipynb
D3_02_Finetuning_LLM_with_PyTorch.ipynb
D3_03_Finetuning_LLM_with_Huggingface.ipynb
D3_04_Quantization.ipynb
D3_05_PEFT.ipynb
D3_06_Unsloth.ipynb

Note: You only need to run this setup notebook once per JupyterLab session. The Ollama pod persists between notebook runs.