A100 Paperspace: Gradient Notebooks, Pricing, and Availability

Deploybase · February 12, 2025 · GPU Pricing

Contents

A100 Paperspace: Gradient Notebooks for Interactive ML Development

A100 Paperspace (now DigitalOcean subsidiary) provides A100 GPU access primarily through Gradient notebooks and managed containers. Pricing starts at $3.09/hr for A100 40GB and $3.18/hr for A100 80GB, with superior availability compared to H100 offerings. A100 capacity regularly exceeds 30-50 globally available instances, making Paperspace viable for interactive development and short-term experiments.

This guide covers Paperspace's A100 offerings, Gradient environment, availability management, and when Paperspace suits ML workflows.

Paperspace A100 Pricing

Paperspace's pricing emphasizes straightforward hourly rates with optional monthly discounts.

A100 Pricing Tiers and Analysis

PlanHourlyMonthly (730 hrs)StorageEffective CostBest Use
A100 40GB On-Demand$3.09$2,25620GB$3.09/hrTemporary experiments
A100 80GB On-Demand$3.18$2,32120GB$3.18/hrLarge model workloads
A100 MonthlyCommittedLower effective rate100GB~$2.60-2.80/hrMonth-long projects
Gradient Notebooks$0.51/hr$372/month10GB$0.51/hr (limited GPU)Interactive development

Gradient notebook pricing (~$0.51/hr for A100) appears cheaper than on-demand instances, but includes significant overhead: limited GPU utilization (shared kernel execution with other notebooks) and slower I/O. Full instance pricing reflects actual single-GPU dedication.

A100 Performance on Paperspace

WorkloadThroughput
7B Model Inference50-60 tokens/sec
13B Model Inference30-40 tokens/sec
Training (LoRA 7B)380-420 tokens/sec

Performance matches other providers (RunPod, Lambda) since underlying hardware is identical.

Detailed Setup and Availability

Launching Paperspace A100: Step-by-Step

  1. Access Paperspace console at https://www.paperspace.com/console
  2. Click "Create Notebook" or "Create Machine"
  3. Filter by GPU: Select "A100" (40GB or 80GB variant)
  4. Choose machine type: A100 (Full instance) or Gradient Notebook (shared)
  5. Select template: PyTorch, TensorFlow, or Custom
  6. Configure:
    • Notebook name / Machine name
    • Billing: Hourly or Monthly
    • Machine: "A100 (40GB)" selected
  7. Click "Start" and wait 2-5 minutes for provisioning
  8. Access Jupyter or SSH once running

A100 Availability Patterns

Paperspace maintains approximately 30-60 A100 instances globally (versus 5-10 H100). A100 availability is significantly better:

Availability by Time

WindowStatusAvailabilityQuality
Peak (9-17 UTC)Constrained50-70%Standard
Off-peak (18-8 UTC)Available90%+Variable (including shared instances)
WeekendsMixed80%+Improved availability

Off-peak A100 access rarely exceeds 2-3 hour wait times. Peak hours sometimes show full capacity, requiring hourly spot checking.

Cost Optimization for Variable Availability

When A100 unavailable, Paperspace offers A6000 fallback (older GPU, 50% cheaper). Workflow:

availability_check = check_paperspace_availability()
if availability_check["a100_available"]:
    select_gpu("A100")  # $3.09-3.18/hr
else:
    select_gpu("A6000")  # $0.50-0.75/hr, 80% performance

Using A6000 as fallback reduces effective cost of Paperspace when A100 is unavailable. A6000 at ~$1.89/hr combined with A100 80GB at $3.18/hr creates a blended rate depending on availability mix.

Gradient Notebook Cost-Benefit Analysis

Gradient Notebooks at $0.51/hr appear attractive but carry hidden costs:

FactorNotebook ImpactCost
Limited GPU time-slicing30-40% lower effective throughput+$0.15-0.20/hr
Shared kernel overheadSlower batch processing+$0.10/hr
Storage constraints (10GB)Frequent cache clearing+$0.05/hr
Effective actual cost($0.51 + overhead)$0.71-0.86/hr

When accounting for shared overhead, Notebook's "cheap" $0.51/hr becomes ~$0.76/hr effective cost, approaching full instance pricing at $1.50-1.80/hr. Full instance is preferable for production training despite higher hourly rate:better throughput justifies cost.

Optimal Workflow: Hybrid Development and Production

Recommended workflow for cost optimization:

  1. Experiment phase (2-4 hours): Use Paperspace Notebook ($0.51/hr) for quick prototyping
  2. Validation phase (4-12 hours): Use A100 monthly plan when available
  3. Fallback: If A100 unavailable, switch to RunPod A100 Spot ($0.60/hr average)
  4. Production deployment: Use Lambda A100 reserved for 99%+ uptime

Total blended cost across workflow: ~$1.00/hr average, competitive with all other providers.

Gradient Notebook Environment

Interactive Development Interface

Paperspace's Gradient provides a Jupyter-like IDE accessible via browser with pre-installed ML libraries.

Key Features:

  • Pre-installed PyTorch, TensorFlow, JAX, scikit-learn
  • Terminal for custom package installation
  • File browser for dataset management
  • Git integration for version control
  • Collaborative notebook sharing with team members
import pandas as pd
import torch
from transformers import pipeline

classifier = pipeline("text-classification",
    model="distilbert-base-uncased-finetuned-sst-2-english")

results = classifier("This is a great product!")
print(results)  # GPU acceleration transparent to user

Storage and Persistence

Gradient provides:

  • 20GB persistent storage (hourly instances)
  • 100GB persistent storage (monthly instances)
  • Access to upload/download external files
  • S3 integration for larger datasets

Workload Optimization for Paperspace A100

Notebook-Based Workflows

Paperspace excels for:

  • Interactive model prototyping and hyperparameter tuning
  • Dataset exploration and visualization
  • Quick inference testing on new models
  • ML research and experimentation

Avoid production deployment directly on Paperspace. Instead, develop notebooks locally, test on Paperspace, then migrate to dedicated infrastructure.

Session Management

Paperspace notebook sessions terminate after 6-hour inactivity. For longer work sessions, periodically save outputs and checkpoint state:

import time

save_interval = 900  # 15 minutes
last_save = time.time()

for epoch in range(num_epochs):
    # Training code

    if time.time() - last_save > save_interval:
        torch.save(model.state_dict(), '/storage/checkpoint.pt')
        last_save = time.time()
        print(f"Checkpoint saved at epoch {epoch}")

Comparing Paperspace A100 to Alternatives

Paperspace vs RunPod

CriteriaPaperspaceRunPod
Hourly Rate$3.09-3.18$1.19
AvailabilityGoodExcellent
Multi-GPU ClustersNoLimited
NotebooksExcellentCommunity
SupportChat supportCommunity

RunPod costs significantly less at $1.19/hr vs Paperspace's $3.09-3.18/hr, and offers better spot pricing. Paperspace excels for interactive notebook-based development.

Paperspace vs Lambda

Lambda A100 at $1.48 on-demand costs similar to Paperspace but offers dedicated capacity and multi-GPU clusters. Choose Paperspace for interactive development; choose Lambda for production clusters.

For cost-sensitive batch workloads, see RunPod spot pricing and Vast.ai peer-to-peer marketplace. For Kubernetes-native production, check CoreWeave clusters.

Dataset Management and Data Transfer

Uploading Training Data

For datasets under 50GB, upload through Paperspace's web interface. Larger datasets require alternative approaches:

rsync -avz --progress /local/dataset/ \
  username@machine.paperspace.com:/storage/dataset/

S3 Integration for Large Datasets

For datasets exceeding 100GB, store on S3 and access within notebooks:

import boto3

s3 = boto3.client('s3')

s3.download_file('my-bucket', 'large_dataset.tar.gz', '/tmp/data.tar.gz')

import smart_open
with smart_open.open('s3://my-bucket/data.csv') as f:
    data = pd.read_csv(f)

A100 Inference on Paperspace

Throughput Expectations

A100 Paperspace achieves competitive inference performance:

  • 7B-parameter model: 100-120 tokens/second (batch size 1)
  • 13B-parameter model: 60-75 tokens/second (batch size 1)
  • Batch inference (size 8): 3-4x throughput improvement

Cost per token at $3.18/hr: ~$0.0176 per token (assuming 50 tokens/second).

Model Serving Best Practices

For production inference, deploy models from Paperspace notebooks to dedicated providers:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b")

with torch.no_grad():
    outputs = model.generate(**inputs, max_length=100)

torch.save(model.state_dict(), '/storage/llama-7b-final.pt')

FAQ

Should I use Paperspace A100 for production ML services?

No. Paperspace's strength is interactive development and experimentation. For production inference, migrate to RunPod ($1.19/hr) or Lambda ($1.48/hr) for guaranteed uptime SLAs.

How does Paperspace A100 monthly pricing compare to on-demand?

A100 monthly plans provide a discount over hourly on-demand rates of $3.09/hr (40GB) and $3.18/hr (80GB). Monthly plans suit projects lasting 3-4 weeks; shorter work favors on-demand. RunPod remains significantly cheaper at $1.19/hr for cost-sensitive workloads.

Can I export Paperspace notebooks and run them elsewhere?

Yes. Download notebooks as .ipynb files from Paperspace. The code is typically portable: Python/PyTorch notebooks run identically on RunPod or Lambda with minimal modification (removing Gradient-specific packages). This portability makes Paperspace excellent for development before production deployment.

When should I choose Paperspace A100 versus RunPod or Lambda?

Choose Paperspace for: (1) Interactive notebook development with immediate feedback, (2) Team collaboration (Gradient notebook sharing), (3) Quick experiments when availability good. Choose RunPod/Lambda for: (1) Production inference requiring 99%+ uptime, (2) Sustained training (12+ hours), (3) Cost-optimized batch processing. Paperspace excels for development; dedicated providers excel for production.

What cost optimization strategies apply to Paperspace A100?

(1) Use monthly plans for savings vs hourly on-demand rates ($3.09-3.18/hr), (2) Choose on-demand for <40-hour projects, (3) Use Gradient Notebooks ($0.51/hr) for development but upgrade to full instance for production training, (4) Schedule off-peak usage (0-4 UTC) for better availability and less queue time.

Sources