Machine learning deployment is the bridge between model training and real-world applications. In this comprehensive guide, we'll walk you through deploying ML models on Vultr using GPU instances, Flask APIs, and production-ready configurations.
Vultr offers dedicated GPU instances powered by NVIDIA GPUs, making it an excellent choice for ML workloads. Here's why developers choose Vultr for AI deployment:
| Instance Type | GPU | Price | Best For |
|---|---|---|---|
| Standard GPU | NVIDIA T4 | $70/mo | Inference, small models |
| Premium GPU | NVIDIA V100 | $150/mo | Training, large models |
| Enterprise GPU | NVIDIA A100 | $300/mo | Production, high throughput |
For most inference workloads, the Standard GPU instance provides excellent performance at an affordable price. Learn more about Vultr AI development in our comprehensive guide.
Navigate to the Vultr Dashboard and create a new instance:
Once your instance is ready, SSH in and install the necessary dependencies:
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python and pip
sudo apt install -y python3 python3-pip python3-venv
# Install CUDA (for NVIDIA GPUs)
sudo apt install -y nvidia-cuda-toolkit
# Verify GPU detection
nvidia-smi
# Create project directory
mkdir ml-deployment && cd ml-deployment
python3 -m venv venv
source venv/bin/activate
# Install ML frameworks
pip install torch tensorflow flask gunicorn
For this tutorial, we'll create a simple sentiment analysis model. In production, you'd upload your trained model files:
# Save your model (example with PyTorch)
import torch
import torch.nn as nn
class SentimentClassifier(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.fc = nn.Linear(embed_dim, num_classes)
def forward(self, x):
embedded = self.embedding(x)
pooled = embedded.mean(dim=1)
return self.fc(pooled)
# Save model
model = SentimentClassifier(vocab_size=10000, embed_dim=128, hidden_dim=64, num_classes=2)
torch.save(model.state_dict(), 'model.pth')
Now let's create a production-ready Flask API to serve predictions:
# app.py
from flask import Flask, request, jsonify
import torch
from model import SentimentClassifier
app = Flask(__name__)
# Load model
model = SentimentClassifier(vocab_size=10000, embed_dim=128, hidden_dim=64, num_classes=2)
model.load_state_dict(torch.load('model.pth', map_location='cpu'))
model.eval()
@app.route('/predict', methods=['POST'])
def predict():
data = request.get_json()
text = data.get('text', '')
# Tokenize and predict
# (simplified for demonstration)
tokens = [hash(word) % 10000 for word in text.split()]
tensor = torch.tensor([tokens])
with torch.no_grad():
output = model(tensor)
prediction = output.argmax(dim=1).item()
return jsonify({
'prediction': 'positive' if prediction == 1 else 'negative',
'confidence': output[0][prediction].item()
})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
For production deployment, use Gunicorn with multiple workers:
# gunicorn_config.py
workers = 4
worker_class = 'sync'
bind = '0.0.0.0:5000'
timeout = 120
max_requests = 1000
max_requests_jitter = 50
Create a systemd service for automatic startup:
# /etc/systemd/system/ml-api.service
[Unit]
Description=ML API Service
After=network.target
[Service]
User=ubuntu
WorkingDirectory=/home/ubuntu/ml-deployment
ExecStart=/home/ubuntu/ml-deployment/venv/bin/gunicorn -c gunicorn_config.py app:app
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable ml-api
sudo systemctl start ml-api
sudo systemctl status ml-api
For better performance and security, put Nginx in front of your Flask app:
sudo apt install -y nginx
# Create Nginx config
sudo nano /etc/nginx/sites-available/ml-api
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:5000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location /static {
alias /home/ubuntu/ml-deployment/static;
}
}
sudo ln -s /etc/nginx/sites-available/ml-api /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
Verify your ML API is working correctly:
# Test with curl
curl -X POST http://localhost/predict \
-H "Content-Type: application/json" \
-d '{"text": "This product is amazing!"}'
Expected response:
{"prediction": "positive", "confidence": 0.95}
To minimize ML deployment costs on Vultr:
Deploying ML models on Vultr is straightforward with GPU instances. Follow this guide to set up your production ML API in under an hour. For more tutorials on Vultr AI development and Cloudbet betting guide, explore our comprehensive resources.
Get started with Vultr GPU instances today and receive $100 in credits!