Vultr GPU Instances for AI Development 2026: Complete Setup Guide
Running AI workloads on CPU-only VPS instances is fine for prototypes—but when you need to train models, run inference at scale, or serve real-time predictions, GPU compute is non-negotiable. Vultr Cloud GPU instances deliver NVIDIA A100 and H100 GPUs at competitive prices, with per-hour billing that makes GPU experimentation economically viable.
This guide walks through setting up a production-ready AI development environment on Vultr: Ubuntu 22.04 configuration, CUDA drivers, Python environment, PyTorch, and deploying your first model.
Why Choose Vultr for AI Development?
Three factors make Vultr a strong choice for AI/ML workloads in 2026:
- Competitive GPU pricing: A100 instances start at ~$2.20/hr with per-second billing—only pay for what you use.
- Global GPU availability: GPU instances available across 12+ regions including US, EU, and Asia-Pacific.
- Flexible bare metal and cloud GPU: Choose dedicated bare metal for maximum performance or cloud GPU for elastic scaling.
Compared to AWS SageMaker or GCP Vertex AI, Vultr GPU instances give you raw compute at a fraction of the managed service premium. The tradeoff: you're responsible for driver installation, environment setup, and infrastructure management.
Vultr GPU Instance Plans 2026
| Instance Type | GPU | VRAM | vCPUs | RAM | Storage | Price/hr |
|---|---|---|---|---|---|---|
| GPU-4 Plus | NVIDIA A100 | 40GB | 8 | 32GB | 200GB NVMe | $2.20 |
| GPU-8 Plus | NVIDIA H100 | 80GB | 16 | 64GB | 400GB NVMe | $4.50 |
| GPU Metal | NVIDIA A100 (dedicated) | 40GB | 16 | 64GB | 1TB NVMe | $3.20 |
For most LLM fine-tuning and inference workloads, a single A100 (40GB VRAM) is sufficient. Use H100 for multi-GPU training or large model inference requiring 70B+ parameter capacity.
Step 1: Deploy Ubuntu GPU Instance
Deploy via Vultr dashboard or CLI:
# Using Vultr CLI
vultr instance create \
--region=ewr \
--plan=gpu-v100-30gb \
--os=Ubuntu\ 22.04 LTS \
--label=ai-dev-gpu
# Or via API
curl -X POST "https://api.vultr.com/v2/instances" \
-H "Authorization: Bearer ${VULTR_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"region": "ewr",
"plan": "gpu-v100-30gb",
"os_id": 270,
"label": "ai-dev-gpu"
}'
After deployment, SSH into your instance. Write down the IP address from the Vultr dashboard.
ssh root@YOUR_INSTANCE_IP
Step 2: Configure Ubuntu 22.04 for GPU Compute
Start with a full system update and essential packages:
# Update system
apt update && apt upgrade -y
# Install essential tools
apt install -y build-essential curl wget git unzip vim \
software-properties-common gnupg apt-transport-https ca-certificates
Configure Network Repositories
For faster package downloads, configure Ubuntu's mirror selection:
# Set optimal mirror (example for US East)
sed -i 's|http://us-east-1.ec2.archive.ubuntu.com|http://mirrors.digitalocean.com|g' \
/etc/apt/sources.list
apt update
Step 3: Install NVIDIA Drivers
Vultr GPU instances come with pre-configured NVIDIA GPU hardware, but you'll need to install drivers. The official NVIDIA CUDA repository is the most reliable method:
# Add NVIDIA CUDA repository
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ \
cuda-keyring_1.1-1_all.deb -o cuda-keyring.deb
dpkg -i cuda-keyring.deb
apt update
# Install NVIDIA driver + CUDA toolkit
apt install -y nvidia-driver-545 cuda-toolkit-12-3
# Verify driver installation
nvidia-smi
Expected output from nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA A100-40GB On | 00000000:00:1E.0 Off | 0 |
| 0% 37C P0 37W / 250W | 0MiB / 40536MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Step 4: Set Up Python Environment with CUDA Support
Use Miniconda for isolated, reproducible Python environments:
# Download and install Miniconda
cd /tmp
curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
# Initialize conda
source /root/miniconda3/etc/profile.d/conda.sh
# Create GPU-accelerated Python environment
conda create -n ai-env python=3.11 -y
conda activate ai-env
# Install PyTorch with CUDA 12.1 support
pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu121
# Verify CUDA availability in Python
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"
Expected output: CUDA available: True\nGPU: NVIDIA A100-40GB
Step 5: Deploy a Production ML Model
Let's deploy a text classification model using FastAPI for inference serving. This pattern works for any trained model—LLMs, image classifiers, recommendation engines.
Install FastAPI and dependencies
# Install web serving stack
pip install fastapi uvicorn transformers \
huggingface-hub accelerate sentencepiece
# Create project structure
mkdir -p /opt/ml-api && cd /opt/ml-api
touch main.py requirements.txt
Create the inference API
# /opt/ml-api/main.py
from fastapi import FastAPI, Request
from pydantic import BaseModel
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
app = FastAPI(title="Text Classifier API", version="1.0.0")
# Load pre-trained model at startup
MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model.eval()
class TextInput(BaseModel):
text: str
class PredictionOutput(BaseModel):
label: str
confidence: float
model: str
@app.post("/predict", response_model=PredictionOutput)
async def predict(input: TextInput):
with torch.no_grad():
inputs = tokenizer(input.text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred_label = "POSITIVE" if torch.argmax(probs) == 1 else "NEGATIVE"
confidence = probs[0][torch.argmax(probs)].item()
return PredictionOutput(
label=pred_label,
confidence=round(confidence, 4),
model=MODEL_NAME
)
@app.get("/health")
async def health():
return {"status": "healthy", "gpu": torch.cuda.is_available()}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Test the API
# Start the server
cd /opt/ml-api
python main.py &
# Test prediction
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "Vultr GPU instances are fantastic for AI development!"}'
Expected response:
{"label":"POSITIVE","confidence":0.9986,"model":"distilbert-base-uncased-finetuned-sst-2-english"}
Step 6: Set Up Nginx Reverse Proxy and SSL
For production deployment, wrap FastAPI with Nginx for load balancing, static file serving, and HTTPS termination:
# Install Nginx
apt install -y nginx
# Create reverse proxy config
cat > /etc/nginx/sites-available/ml-api << 'EOF'
server {
listen 80;
server_name YOUR_DOMAIN_OR_IP;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# For streaming responses (LLM inference)
proxy_buffering off;
proxy_read_timeout 300s;
}
}
EOF
# Enable site
ln -s /etc/nginx/sites-available/ml-api /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx
For HTTPS, use Let's Encrypt:
apt install -y certbot python3-certbot-nginx
certbot --nginx -d YOUR_DOMAIN
Step 7: Production Considerations and Cost Optimization
Auto-shutdown Script
GPU instances are expensive. Implement auto-shutdown to avoid idle billing:
# /opt/shutdown-idle-gpu.sh
#!/bin/bash
IDLE_MINUTES=30
THRESHOLD=5 # GPU utilization %
UTIL=$(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits)
if [ "$UTIL" -lt "$THRESHOLD" ]; then
IDLE_TIME=$(cat /proc/uptime | awk '{print int($1/60)}')
if [ "$IDLE_TIME" -gt "$IDLE_MINUTES" ]; then
echo "GPU idle for $IDLE_TIME minutes. Shutting down."
/usr/sbin/shutdown -h now
fi
fi
# Add to crontab: */5 * * * * /opt/shutdown-idle-gpu.sh
Monitoring with Prometheus + Grafana
# Install node_exporter for system metrics
apt install -y prometheus-node-exporter
# Install NVIDIA GPU exporter
pip install nvidia-ml-py
curl -LO https://github.com/utkuonur/nvidia-gpu-exporter/releases/download/v1.0.0/gpu_exporter_linux_amd64
chmod +x gpu_exporter_linux_amd64
mv gpu_exporter_linux_amd64 /usr/local/bin/
# Add to systemd service for auto-start
cat > /etc/systemd/system/gpu-exporter.service << 'EOF'
[Unit]
Description=NVIDIA GPU Metrics Exporter
After=network.target
[Service]
ExecStart=/usr/local/bin/gpu_exporter_linux_amd64
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl enable gpu-exporter
systemctl start gpu-exporter
Backup and Persistence
Store trained models and datasets on Vultr Block Storage for durability:
# Create and attach block storage (requires Vultr dashboard or API)
# Attach 100GB block storage to /dev/vdb
mkfs.ext4 -F /dev/vdb
mkdir -p /mnt/models
mount /dev/vdb /mnt/models
# Add to /etc/fstab for auto-mount on reboot
echo '/dev/vdb /mnt/models ext4 defaults,nofail 0 2' >> /etc/fstab
Troubleshooting Common GPU Setup Issues
Problem: nvidia-smi reports "No devices were found"
Cause: NVIDIA driver not loaded, or GPU not passthrough to the VM.
Solution: Check that the instance type supports GPU. Reboot the instance after driver installation. If issue persists, destroy and redeploy a new GPU instance.
Problem: PyTorch reports "CUDA out of memory" on small batches
Cause: Insufficient VRAM for model + batch size combination.
Solution: Reduce batch size (batch_size=4), enable gradient checkpointing, or upgrade to a larger GPU instance with more VRAM.
Problem: Model downloads timing out from HuggingFace
Cause: Slow or restricted network connectivity on the GPU instance.
Solution: Configure HF mirror for Chinese regions: export HF_ENDPOINT=https://hf-mirror.com
Conclusion
Vultr GPU instances provide enterprise-grade NVIDIA compute at accessible prices. With Ubuntu 22.04, proper driver installation, and a well-configured Python environment, you can run anything from lightweight inference APIs to full model training pipelines.
The key to cost-effective GPU computing on Vultr: use per-second billing for experimentation, implement auto-shutdown scripts for idle instances, and leverage block storage for model persistence rather than burning GPU instance hours on data I/O.
Start Your AI Project on Vultr
Deploy a GPU instance today and get $250 in free credits:
Deploy GPU Instance Now →