- Vishakha Sadhwani
- Posts
- The Compute Stack Behind AI Workloads
The Compute Stack Behind AI Workloads
CPUs, GPUs, and TPUs—What Powers Preprocessing, Training, and Inference

Hi Inner Circle,
Let’s take a look at the compute muscle powering today’s AI models.
But before we dive in, it’s important to understand the workloads these compute types are built to handle.
The three crucial use cases driving both traditional machine learning and the generative AI revolution are:
Preprocessing
Training
Inference

Let’s dive into each one and then look at the different compute options:
Preprocessing
Think of this as data cleanup and preparation.
Input: Raw data (text, images, audio, etc.)
Output: Formatted, clean, model-ready data
Examples:
Text: Convert raw sentences into tokens or embeddings (e.g., BERT tokenizer)
Images: Resize and normalize images for vision models like CLIP
Audio: Convert .wav files into spectrograms for models like Whisper
This step ensures that data is in the right shape and scale before training or inference.
Training
This is where the model learns from data.
Input: Preprocessed data + correct labels (for supervised learning)
Output: A trained model (with tuned weights and parameters)
Examples:
GPT was trained on internet-scale text data
Stable Diffusion was trained on text-image pairs
ResNet was trained on labeled image datasets like ImageNet
Training is compute-heavy and often takes days or weeks depending on model size.
Inference
Using the trained model to make predictions.
Input: New, preprocessed data + trained model
Output: Prediction or generated output
Examples:
GPT gets a prompt and generates text
Midjourney takes a text prompt and outputs an image
A sentiment classifier takes a review and returns "positive" or "negative"
Inference should be fast, especially in real-time apps like chatbots or recommendation systems.
Compute Machine Types for GenAI Workloads
Understanding the hardware that powers AI models helps you optimize cost and performance:
CPUs (Central Processing Units)
Ideal for preprocessing and lightweight inference tasks
Commonly used alongside GPUs/TPUs for orchestration, data loading, and distributed training
Offer flexible memory handling and broad software compatibility
Example: Running a Flask API server that serves model predictions
GPUs (Graphics Processing Units)
Designed for parallel processing — ideal for deep learning
Power both training and inference in models like GPT, DALL·E, and Llama
Efficiently handle large matrix operations (e.g., transformer attention mechanisms)
Supported across all major ML frameworks (PyTorch, TensorFlow, JAX, etc.)
Enable custom CUDA kernels and offer fine-grained control over operations
Strong ecosystem support and flexibility for both research and production
Example: NVIDIA A100, H100, H200, B200, etc., used in most modern AI data centers
TPUs (Tensor Processing Units)
Google’s custom ASICs (Application-Specific Integrated Circuits) built specifically for AI workloads
Optimized for large matrix multiplications using systolic array architecture (distinct from GPUs)
Tightly integrated with TensorFlow and JAX, using the XLA compiler for graph optimization
Use slices (partitioned hardware units) to parallelize large-scale model training
Deliver high energy efficiency and excellent cost-performance at scale
Extremely efficient for training large LLMs across multiple nodes
Example: Gemini models are trained on TPU v4/v5 pods
Cloud Compute Modes:
On-Demand: Flexible, pay-as-you-go model
Spot/Preemptible Instances: Deep discounts with the risk of interruption
Reserved/Committed Use: Lower rates in exchange for a long-term commitment (1–3 years)
Dedicated or Custom Instances: High-performance configurations tailored for specific hardware needs
How to Stay Updated
Follow hardware releases from NVIDIA, AMD, Intel, and Google Cloud
Watch benchmarks from MLPerf, Hugging Face, and independent researchers
Join communities like Papers with Code, ML Collective, and AI circles on social media platforms like Instagram/FB.
That’s the core of how models come to life — from raw data to predictions, powered by serious compute muscle.
Stay tuned—data pipelines and storage options coming tomorrow!