nvidia-python Pod¶

Standard OCI container - works with Docker, Podman, Kubernetes, Apptainer.

The nvidia-python pod provides a complete ML/AI development environment with PyTorch and CUDA support, managed by pixi for deterministic builds.

Overview¶

Attribute	Value
Image	`ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable`
Size	~6GB
GPU	NVIDIA (CUDA 12.4)
Inherits	pod-nvidia
Foundation for	jupyter pod

Quick Start¶

Docker / PodmanKubernetesHPC (Apptainer)Bazzite AI OS

# With NVIDIA GPU
docker run -it --rm --gpus all -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

# CPU-only
docker run -it --rm -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

# AMD/Intel GPU
docker run -it --rm --device=/dev/dri -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

apiVersion: batch/v1
kind: Job
metadata:
  name: pytorch-training
spec:
  template:
    spec:
      containers:
      - name: pytorch
        image: ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: OnFailure

apptainer pull docker://ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
apptainer exec --nv bazzite-ai-pod-nvidia-python_stable.sif bash

apptainer pull docker://ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
apptainer shell --nv bazzite-ai-pod-nvidia-python_stable.sif

What's Included¶

ML/AI Stack¶

PyTorch with CUDA 12.4 support
torchvision - Computer vision models and transforms
torchaudio - Audio processing

From nvidia Pod¶

CUDA Toolkit 13.0
cuDNN (Deep Neural Network library)
TensorRT (inference optimization)

From base Pod¶

Python 3.13, Node.js 23+, Go, Rust
VS Code, Docker CLI, Podman
kubectl, Helm, Claude Code
Build tools (gcc, make, cmake, ninja)

Usage¶

Activate the ML Environment¶

The pod uses pixi for environment management:

# Activate the pixi environment
pixi shell --manifest-path /opt/pixi/pixi.toml

# Or run commands directly
pixi run --manifest-path /opt/pixi/pixi.toml python train.py

Verify GPU Access¶

import torch

# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")
print(f"Device name: {torch.cuda.get_device_name(0)}")

# Quick benchmark
x = torch.randn(1000, 1000, device='cuda')
y = torch.matmul(x, x)
print(f"Matrix multiplication on GPU: {y.shape}")

Training Example¶

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple model
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
).cuda()

# Training loop
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    # Your training code here
    pass

Environment Details¶

Pixi Project Location¶

/opt/pixi/
├── pixi.toml      # Environment configuration
├── pixi.lock      # Locked dependencies (deterministic)
└── .pixi/         # Installed packages

Environment Variables¶

Variable	Value
`NVIDIA_PYTHON_PROJECT`	`/opt/pixi`
`PATH`	Includes `/opt/pixi/bin`, `/usr/local/cuda/bin`

Common Tasks¶

Install Additional Packages¶

# Inside the pod, activate environment first
pixi shell --manifest-path /opt/pixi/pixi.toml

# Install with pip (inside pixi environment)
pip install transformers datasets accelerate

Run Jupyter Notebook¶

For interactive notebook development, use the jupyter pod instead, which includes JupyterLab pre-configured.

Export Trained Models¶

# Save PyTorch model
torch.save(model.state_dict(), '/workspace/model.pth')

# Export to ONNX for TensorRT optimization
torch.onnx.export(model, dummy_input, '/workspace/model.onnx')

Workspace¶

Your current directory is mounted at /workspace:

# On host
cd ~/projects/my-ml-project

# Docker/Podman
docker run -it --rm --gpus all -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

# Inside pod - your files are here
ls /workspace/

Troubleshooting¶

CUDA Not Available¶

Ensure NVIDIA GPU is present: nvidia-smi (on host)
For Docker: Install NVIDIA Container Toolkit
For Bazzite AI OS: Run ujust setup-gpu-pods (one-time)

Out of Memory¶

# Clear GPU memory
torch.cuda.empty_cache()

# Use gradient checkpointing for large models
from torch.utils.checkpoint import checkpoint

Pixi Environment Issues¶

# Rebuild pixi environment
pixi install --manifest-path /opt/pixi/pixi.toml --frozen