The Compute Stack Behind AI Workloads

CPUs, GPUs, and TPUs—What Powers Preprocessing, Training, and Inference

Hi Inner Circle,

Let’s take a look at the compute muscle powering today’s AI models.

But before we dive in, it’s important to understand the workloads these compute types are built to handle.

The three crucial use cases driving both traditional machine learning and the generative AI revolution are:

  • Preprocessing

  • Training

  • Inference

Let’s dive into each one and then look at the different compute options:

Preprocessing

Think of this as data cleanup and preparation.

  • Input: Raw data (text, images, audio, etc.)

  • Output: Formatted, clean, model-ready data

Examples:

  • Text: Convert raw sentences into tokens or embeddings (e.g., BERT tokenizer)

  • Images: Resize and normalize images for vision models like CLIP

  • Audio: Convert .wav files into spectrograms for models like Whisper

This step ensures that data is in the right shape and scale before training or inference.

Training

This is where the model learns from data.

  • Input: Preprocessed data + correct labels (for supervised learning)

  • Output: A trained model (with tuned weights and parameters)

Examples:

  • GPT was trained on internet-scale text data

  • Stable Diffusion was trained on text-image pairs

  • ResNet was trained on labeled image datasets like ImageNet

Training is compute-heavy and often takes days or weeks depending on model size.

Inference

Using the trained model to make predictions.

  • Input: New, preprocessed data + trained model

  • Output: Prediction or generated output

Examples:

  • GPT gets a prompt and generates text

  • Midjourney takes a text prompt and outputs an image

  • A sentiment classifier takes a review and returns "positive" or "negative"

Inference should be fast, especially in real-time apps like chatbots or recommendation systems.

Compute Machine Types for GenAI Workloads

Understanding the hardware that powers AI models helps you optimize cost and performance:

CPUs (Central Processing Units)

  • Ideal for preprocessing and lightweight inference tasks

  • Commonly used alongside GPUs/TPUs for orchestration, data loading, and distributed training

  • Offer flexible memory handling and broad software compatibility

  • Example: Running a Flask API server that serves model predictions

GPUs (Graphics Processing Units)

  • Designed for parallel processing — ideal for deep learning

  • Power both training and inference in models like GPT, DALL·E, and Llama

  • Efficiently handle large matrix operations (e.g., transformer attention mechanisms)

  • Supported across all major ML frameworks (PyTorch, TensorFlow, JAX, etc.)

  • Enable custom CUDA kernels and offer fine-grained control over operations

  • Strong ecosystem support and flexibility for both research and production

  • Example: NVIDIA A100, H100, H200, B200, etc., used in most modern AI data centers

TPUs (Tensor Processing Units)

  • Google’s custom ASICs (Application-Specific Integrated Circuits) built specifically for AI workloads

  • Optimized for large matrix multiplications using systolic array architecture (distinct from GPUs)

  • Tightly integrated with TensorFlow and JAX, using the XLA compiler for graph optimization

  • Use slices (partitioned hardware units) to parallelize large-scale model training

  • Deliver high energy efficiency and excellent cost-performance at scale

  • Extremely efficient for training large LLMs across multiple nodes

  • Example: Gemini models are trained on TPU v4/v5 pods

Cloud Compute Modes:

  • On-Demand: Flexible, pay-as-you-go model

  • Spot/Preemptible Instances: Deep discounts with the risk of interruption

  • Reserved/Committed Use: Lower rates in exchange for a long-term commitment (1–3 years)

  • Dedicated or Custom Instances: High-performance configurations tailored for specific hardware needs

How to Stay Updated

  • Follow hardware releases from NVIDIA, AMD, Intel, and Google Cloud

  • Watch benchmarks from MLPerf, Hugging Face, and independent researchers

  • Join communities like Papers with Code, ML Collective, and AI circles on social media platforms like Instagram/FB.

That’s the core of how models come to life — from raw data to predictions, powered by serious compute muscle.

Stay tuned—data pipelines and storage options coming tomorrow!