Skip to content

nvidia-python Pod

Standard OCI container - works with Docker, Podman, Kubernetes, Apptainer.

The nvidia-python pod provides a complete ML/AI development environment with PyTorch and CUDA support, managed by pixi for deterministic builds.

Overview

Attribute Value
Image ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
Size ~6GB
GPU NVIDIA (CUDA 12.4)
Inherits pod-nvidia
Foundation for jupyter pod

Quick Start

# With NVIDIA GPU
docker run -it --rm --gpus all -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

# CPU-only
docker run -it --rm -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

# AMD/Intel GPU
docker run -it --rm --device=/dev/dri -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
apiVersion: batch/v1
kind: Job
metadata:
  name: pytorch-training
spec:
  template:
    spec:
      containers:
      - name: pytorch
        image: ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: OnFailure
apptainer pull docker://ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
apptainer exec --nv bazzite-ai-pod-nvidia-python_stable.sif bash
apptainer pull docker://ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable
apptainer shell --nv bazzite-ai-pod-nvidia-python_stable.sif

What's Included

ML/AI Stack

  • PyTorch with CUDA 12.4 support
  • torchvision - Computer vision models and transforms
  • torchaudio - Audio processing

From nvidia Pod

  • CUDA Toolkit 13.0
  • cuDNN (Deep Neural Network library)
  • TensorRT (inference optimization)

From base Pod

  • Python 3.13, Node.js 23+, Go, Rust
  • VS Code, Docker CLI, Podman
  • kubectl, Helm, Claude Code
  • Build tools (gcc, make, cmake, ninja)

Usage

Activate the ML Environment

The pod uses pixi for environment management:

# Activate the pixi environment
pixi shell --manifest-path /opt/pixi/pixi.toml

# Or run commands directly
pixi run --manifest-path /opt/pixi/pixi.toml python train.py

Verify GPU Access

import torch

# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")
print(f"Device name: {torch.cuda.get_device_name(0)}")

# Quick benchmark
x = torch.randn(1000, 1000, device='cuda')
y = torch.matmul(x, x)
print(f"Matrix multiplication on GPU: {y.shape}")

Training Example

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple model
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
).cuda()

# Training loop
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    # Your training code here
    pass

Environment Details

Pixi Project Location

/opt/pixi/
├── pixi.toml      # Environment configuration
├── pixi.lock      # Locked dependencies (deterministic)
└── .pixi/         # Installed packages

Environment Variables

Variable Value
NVIDIA_PYTHON_PROJECT /opt/pixi
PATH Includes /opt/pixi/bin, /usr/local/cuda/bin

Common Tasks

Install Additional Packages

# Inside the pod, activate environment first
pixi shell --manifest-path /opt/pixi/pixi.toml

# Install with pip (inside pixi environment)
pip install transformers datasets accelerate

Run Jupyter Notebook

For interactive notebook development, use the jupyter pod instead, which includes JupyterLab pre-configured.

Export Trained Models

# Save PyTorch model
torch.save(model.state_dict(), '/workspace/model.pth')

# Export to ONNX for TensorRT optimization
torch.onnx.export(model, dummy_input, '/workspace/model.onnx')

Workspace

Your current directory is mounted at /workspace:

# On host
cd ~/projects/my-ml-project

# Docker/Podman
docker run -it --rm --gpus all -v $(pwd):/workspace \
  ghcr.io/atrawog/bazzite-ai-pod-nvidia-python:stable

# Inside pod - your files are here
ls /workspace/

Troubleshooting

CUDA Not Available

  1. Ensure NVIDIA GPU is present: nvidia-smi (on host)
  2. For Docker: Install NVIDIA Container Toolkit
  3. For Bazzite AI OS: Run ujust setup-gpu-pods (one-time)

Out of Memory

# Clear GPU memory
torch.cuda.empty_cache()

# Use gradient checkpointing for large models
from torch.utils.checkpoint import checkpoint

Pixi Environment Issues

# Rebuild pixi environment
pixi install --manifest-path /opt/pixi/pixi.toml --frozen

See Also