Skip to content

bazzite-ai-jupyter

ML/AI development workflows for JupyterLab - Ollama API, LangChain, RAG, fine-tuning, and model optimization.

Overview

This plugin provides skills for ML/AI workflows in JupyterLab, including Ollama API operations for LLM inference.

MCP Server

This plugin includes a Jupyter MCP server that connects to a running JupyterLab instance.

Configuration:

  • URL: http://127.0.0.1:8888/mcp
  • Type: HTTP-based MCP server

Prerequisite: JupyterLab must be running with MCP support enabled (via ujust jupyter start).

Note: This plugin is designed to work with the bazzite-ai-pod-jupyter container or any JupyterLab environment with the required packages.

Skills

Ollama API Operations

Skill Description
chat Direct REST API operations using requests library
ollama Official ollama Python library usage
openai OpenAI compatibility layer for migration
gpu GPU monitoring, VRAM usage, and inference metrics
huggingface Import GGUF models from HuggingFace

ML/AI Development

Skill Description
langchain LangChain framework - prompts, chains, and model wrappers
rag Retrieval-Augmented Generation with vector stores
evaluation LLM evaluation and prompt optimization with Evidently.ai
transformers Transformer architecture concepts (attention, FFN)
finetuning Model fine-tuning with PyTorch and HuggingFace Trainer
quantization Model quantization for efficient inference
peft Parameter-efficient fine-tuning (LoRA, Unsloth)
sft Supervised Fine-Tuning with SFTTrainer and Unsloth
grpo Group Relative Policy Optimization for RLHF
dpo Direct Preference Optimization from preference pairs
reward Reward model training for RLHF pipelines
rloo Reinforcement Learning with Leave-One-Out baseline
inference Fast inference with vLLM and thinking model parsing
vision Vision model fine-tuning with FastVisionModel
qlora Advanced QLoRA experiments (alpha, rank, modules)

MCP Server Tools

Connection: http://127.0.0.1:8888/mcp

Tool Description
mcp__jupyter__list_files List files in Jupyter server filesystem
mcp__jupyter__list_kernels List available kernels
mcp__jupyter__use_notebook Activate a notebook for operations
mcp__jupyter__read_notebook Read notebook cells and structure
mcp__jupyter__insert_cell Insert new cells
mcp__jupyter__execute_cell Execute notebook cells
mcp__jupyter__execute_code Execute code directly in kernel

The MCP server starts automatically when this plugin is enabled.

Prerequisites

JupyterLab Environment:

  • JupyterLab server running at http://localhost:8888 with MCP enabled
  • GPU access configured if using GPU-accelerated training

Ollama (for inference):

  • Ollama server running (default: http://ollama:11434 or OLLAMA_HOST env var)
  • Model available (pull via API or Python library)

Note: All required Python packages are pre-installed in the bazzite-ai-pod-jupyter container.

Quick Start

Ollama Python Library

import ollama

# Generate text
result = ollama.generate(model="llama3.2:latest", prompt="Hello!")
print(result["response"])

# Chat completion
response = ollama.chat(
    model="llama3.2:latest",
    messages=[{"role": "user", "content": "What is Python?"}]
)
print(response["message"]["content"])

Critical Import Order (for Fine-tuning)

# CRITICAL: Import unsloth FIRST for proper TRL patching
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported

# Then other imports
from trl import SFTTrainer, SFTConfig

LangChain with Ollama

import os
from langchain_openai import ChatOpenAI

OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")

llm = ChatOpenAI(
    base_url=f"{OLLAMA_HOST}/v1",
    api_key="ollama",
    model="hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"
)

response = llm.invoke("What is machine learning?")
print(response.content)

RAG Pipeline

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url=f"{OLLAMA_HOST}/v1",
    api_key="ollama"
)

vectorstore = Chroma.from_texts(documents, embeddings)
retriever = vectorstore.as_retriever()

Fine-tuning with LoRA

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05
)

model = get_peft_model(base_model, lora_config)

MCP Server

Jupyter (http)

  • URL: http://127.0.0.1:8888/mcp

Skills

Skill Description
chat
dpo
evaluation
finetuning
gpu
grpo
huggingface
inference
langchain
ollama
openai
peft
qlora
quantization
rag
reward
rloo
sft
transformers
vision