Back

Common AI GPUs: A100, H100, B200, MI300X, Ascend, and More

2026-05-04
AI InfrastructureGPUHardware

I have been running into more and more GPU names lately, and I realized I did not have a stable mental model for how they relate to each other.

At first, A100 and H100 were enough to give me a rough sense of the landscape. Then H200, B200, B300, GB200, H20, A800, H800, MI300X, and Ascend 910B/910C started showing up in the same conversations. After a while, the names stopped being useful by themselves. Some are training GPUs, some are inference-oriented, some are mainly about memory, some are export-limited variants, and some are not single cards at all.

So this post is not a procurement guide or an attempt to build a complete spec sheet. It is a hardware map for myself. The goal is that when I hear these model names again, I can roughly place them by generation, strength, weakness, and ecosystem.

This is based on public information available around May 2026. Memory and bandwidth numbers are mostly vendor-published figures. Prices are rough ranges only, because data center GPUs rarely have a stable retail price. OEM systems, cloud rentals, secondary markets, regional supply, and export controls can move the number a lot.

Why these names are hard to compare

The confusion comes from three places.

First, NVIDIA's main line and restricted variants get mixed together. A100, H100, H200, and B200 are part of NVIDIA's data-center line. A800, H800, and H20 are shaped by export controls or regional market constraints. The names look close, but the limits can be very different.

Second, memory, bandwidth, and compute are different things. H200 is attractive mainly because of 141GB HBM3e and 4.8TB/s bandwidth, not because it is a new architecture after H100. H20 has good-looking memory, but much lower Tensor compute than H100. Sorting by model number is not enough.

Third, single GPU, server, and rack names are often used in the same sentence. B200 is a GPU. GB200 usually means a Grace CPU plus Blackwell GPU Superchip or platform. GB200 NVL72 is a 72-GPU rack-scale system. Comparing GB200 NVL72 with one H100 card is already crossing levels.

Building intuition for memory, bandwidth, and compute

The easiest concepts for me to mix up are memory capacity, memory bandwidth, and compute. They are all related to performance, but they answer different questions.

Memory capacity answers whether the workload fits. Model weights, KV cache, intermediate states in a batch, and in training the gradients and optimizer states all live in GPU memory. For inference, a rough first estimate is: BF16/FP16 weights use about 2 bytes per parameter, FP8/INT8 about 1 byte, and INT4/FP4 about 0.5 byte. A 70B model needs about 140GB just for BF16 weights. With 4-bit quantization, the weights are around 35GB, but KV cache, runtime buffers, and framework overhead still need room. Training is much more memory-hungry because weights are only one part of the footprint.

Memory bandwidth answers how much data can be moved from memory per second. This matters a lot for LLM inference, especially during decode, when the model generates one token at a time. Many inference workloads are not waiting because the GPU lacks math units; they are waiting because every token requires reading weights and KV cache from HBM. H200 is often faster than H100 in inference because it has 141GB HBM3e and 4.8TB/s bandwidth, not because it is a brand-new architecture. MI300X, MI325X, and MI350X follow the same basic logic: large HBM capacity and high bandwidth are central to their LLM-serving story.

Compute answers how fast the card can do matrix multiplication. Training, long-prompt prefill, and large multimodal blocks tend to lean harder on Tensor Core throughput. Here the datatype matters a lot. V100 and A100 are mostly FP16/BF16-era cards. H100 and H200 make FP8 important. B200 and B300 bring FP4 into the foreground. Vendor tables also often quote sparse numbers, while dense numbers are usually about half. Directly comparing A100 BF16, H100 FP8, and B200 FP4 numbers is a good way to confuse myself.

The rough checklist I now use is:

QuestionMain thing to checkIntuition
Can the model run at all?Memory capacityWeights, KV cache, batch state, and training state need to fit
Is single-token generation fast?Memory bandwidthDecode is often waiting for data to move from HBM into compute units
Are training and prefill fast?Tensor computeLarge matrix multiplications need enough Tensor Core throughput
Does it scale across GPUs?Interconnect and software stackNVLink, NVSwitch, InfiniBand, NCCL/RCCL/HCCL, and framework support matter

So the practical order is: check memory capacity first, to see whether the model and context fit; then bandwidth, especially for decode-heavy inference; then compute, for training, prefill, and large batches; then interconnect and ecosystem, to see whether the workload can scale across GPUs and machines. A good spec sheet is one thing; getting my actual training or inference stack to run is another.

NVIDIA's main line: V100 to B300

NVIDIA becomes much easier to understand if I pull out the main line first. V100 starts the Tensor Core era. A100 is the classic training card. H100 is the modern large-model training and inference baseline. H200 is more like a large-memory, high-bandwidth Hopper. B200 and B300 move into Blackwell and Blackwell Ultra.

ModelTime / architectureMemory and bandwidthCompute shorthandRough price / accessIntuition
V1002017 Volta16/32GB HBM2, about 900GB/sFP16 Tensor about 125 TFLOPSUsed, about $1k-$5kFirst Tensor Core training generation, now mostly historical context or cheap compute
A100 40GB2020 Ampere40GB HBM2, about 1.6TB/sFP16/BF16 dense about 312 TFLOPSAbout $6k-$15kClassic training card, but 40GB is tight for modern LLMs
A100 80GB2020/2021 Ampere80GB HBM2e, about 1.9TB/s PCIe or 2.0TB/s SXMFP16/BF16 dense about 312 TFLOPS, sparse about 624 TFLOPSAbout $10k-$25kStill common and dependable for training, fine-tuning, and inference
H100 PCIe/SXM2022 Hopper80GB HBM2e/HBM3, SXM about 3.35TB/sFP8 dense about 2 PFLOPS, sparse about 4 PFLOPSAbout $25k-$45kBaseline modern large-model training and inference GPU
H100 NVL2023 Hopper94GB HBM3, about 3.9TB/s, often deployed in pairsCompute close to H100About $35k-$50k per-card classPCIe-oriented LLM inference variant
H2002024 Hopper141GB HBM3e, 4.8TB/sCompute close to H100/H100 NVLAbout $35k-$55k+Large-memory, high-bandwidth Hopper; strong for long-context inference
B2002025 BlackwellDGX/HGX commonly 180GB HBM3e, about 8TB/sFP4 dense about 9 PFLOPS, FP8 dense about 4.5 PFLOPSAbout $45k-$70k+, 8-GPU systems often above $500kMain Blackwell flagship with a large low-precision jump
B300 / Blackwell Ultra2025/2026 Blackwell UltraUp to about 288GB HBM3e, about 8TB/sFP4 dense about 15 PFLOPS classQuote-based, usually sold as systems or racksBuilt for reasoning, long context, and large MoE workloads

H200 is the easy one to misunderstand. It is not a completely new architecture after H100. It is a Hopper-family card with much more memory and bandwidth. For long-context inference, large batches, and memory-bound workloads, that matters a lot. For compute-bound workloads, the jump may be less dramatic than the name suggests.

B200 and B300 are the newer generation story. Blackwell is not just about memory. It is also about FP4/FP8, NVLink 5, rack-scale system design, and throughput for reasoning-oriented inference. Those advantages show up most clearly in system-level deployments.

A800, H800, and H20 should not be sorted by name

China-market and export-control SKUs make the naming harder. A800, H800, and H20 are common, but they are not a normal upgrade path.

ModelRelationshipMemory and bandwidthMain limitIntuition
A800A100 export-limited variant40/80GB, bandwidth close to A100Limited NVLink bandwidthSimilar to A100 in compute, weaker in scaling
H800H100 export-limited variant80GB HBM3, close to H100Limited inter-GPU bandwidthFine for single-node or medium-scale use, worse than full H100 for very large training
H20Restricted Hopper SKUCommonly 96GB HBM3, about 4.0TB/sTensor compute far below H100Good memory, inference-leaning, not a normal midpoint between H100 and H200

I keep this group in a separate bucket. Their common trait is not that they occupy a natural performance position, but that policy, channel availability, and regional constraints reshaped them. Names like L20, RTX 6000D, and B40 also appear in discussions, but their specs and channel naming are messier, so I am not expanding them here.

GB200 and GB300 are systems, not just GPUs

This is another place where the names blur. B200 and B300 are GPUs. GB200 and GB300 usually mean Grace CPU plus Blackwell GPU Superchip configurations or platforms. NVL72 is a 72-GPU rack-scale system.

NameWhat it isHow to think about it
HGX H100/H200/B200/B3004-GPU or 8-GPU baseboard platform for OEM serversCommon shape from cloud and server vendors
DGX H100/H200/B200/B300NVIDIA first-party systemUsually an 8-GPU node plus validated system and software stack
GH200Grace CPU + Hopper GPU SuperchipCPU and GPU connected tightly through NVLink-C2C, useful for HPC and large-memory workloads
GB200Grace CPU + Blackwell GPU SuperchipGB200 NVL72 is a 72-GPU rack-scale platform, not a card
GB300Grace + Blackwell UltraGB300 NVL72 targets reasoning and test-time scaling; power, cooling, and networking are system-level concerns
NVL72A 72-GPU NVLink rackCloser to a rack-scale AI supercomputer than a GPU card

When I see GB200, GB300, or NVL72, I first need to check whether the discussion is about a chip, a node, or a rack. Their value is the rack-scale memory pool, NVLink domain, networking, power delivery, and liquid-cooling design. They should not be compared one-to-one with a single H100.

Only the most common non-NVIDIA names

There are many non-NVIDIA accelerators, but for a first mental map I only need the names that show up most often in large-model infrastructure discussions.

Vendor / modelMemory and bandwidthRough peerMain point
AMD MI300X192GB HBM3, about 5.3TB/sH100/H200Large memory and bandwidth are the core selling points; often compared for LLM inference
AMD MI325X256GB HBM3e, about 6TB/sBetween H200 and B200More memory for long context and larger batches
AMD MI350X / MI355X288GB HBM3e, about 8TB/sB200/B300FP4/FP6 support, high power, next-generation liquid-cooled data center focus
Huawei Ascend 910BPublic sources often cite 64GB HBM2e, about 1.2-1.6TB/sBelow A100/H100 depending workloadCommon in China, but public specs are not fully transparent
Huawei Ascend 910CPublic estimates cite 96GB HBM2e, about 1.8-3.2TB/sPartial H100 substituteSources disagree; adaptation and cluster engineering matter more than headline numbers
Intel Gaudi 3128GB HBM2e, about 3.7TB/sH100 alternative routeEthernet-based interconnect and price-performance positioning; smaller ecosystem than CUDA
Google TPU v5p/v6eCloud Pod form factorH100/H200/B200 cluster alternativeStrong JAX/XLA stack, clear platform lock-in
AWS Trainium2Commonly 96GiB HBM, about 2.9TB/sH100 cloud training alternativeGood price-performance potential, but depends on Neuron SDK

AMD's line is relatively easy to place: it challenges NVIDIA with larger memory, high bandwidth, and more room on price. ROCm is much better than it used to be, but CUDA remains smoother across kernels, frameworks, tools, and debugging. Whether MI300X or MI325X saves money depends on whether the exact model stack already runs well on ROCm.

Ascend needs caution. Public 910B/910C specs are less transparent than NVIDIA, AMD, or Intel specs, and different sources report different memory, bandwidth, and FP16 numbers. In real deployments, CANN, HCCL, MindSpore/PyTorch adaptation, kernel coverage, and cluster stability often matter more than single-card peak numbers.

Gaudi, TPU, and Trainium are more platform-shaped AI accelerators. They can be strong, but the comparison includes cloud platform, compiler stack, and ecosystem constraints, not just hardware.

My rough rankings

For large-model training:

B300/GB300 > B200/GB200 > H200 about H100 > H800 > A100/A800 > V100.

For memory friendliness:

MI350/MI355X and B300 > MI325X > MI300X and B200 > H200 > H20 and H100 NVL > H100/A100 80GB.

For China-market availability:

A800 and H800 are important historical export variants. H20 is the restricted Hopper inference card. Ascend 910B/910C is the domestic replacement line. Names like L20, RTX 6000D, and B40 have to be read together with the policy and channel context of the moment, not sorted by model number.

The three lines I will remember

One line is NVIDIA's data-center main line: V100 to A100, then H100/H200, then B200/B300. A100 is the classic era. H100 is the modern baseline. H200 is large-memory Hopper. B200 and B300 are Blackwell and Blackwell Ultra.

Another line is export-limited or restricted SKUs: A800, H800, H20. Do not sort them naturally by name. Check whether the limit is interconnect, compute, memory, or supply.

The third line is alternative ecosystems. AMD Instinct challenges NVIDIA mainly with memory capacity and bandwidth. Ascend matters in domestic Chinese replacement scenarios. Gaudi, TPU, and Trainium are more platform-shaped AI accelerators. Before staring at TFLOPS, ask whether the model fits, whether inference is memory-bound, whether training needs multi-GPU scaling, and whether the software stack actually runs.

Sources and caveats

The specification baseline comes mainly from NVIDIA's V100, A100, H100, H200, HGX, and GB300 pages and datasheets; AMD Instinct MI300 and MI350 pages; Intel's Gaudi 3 white paper; and public research or policy sources such as CSET for Huawei Ascend. Price ranges come from public reseller, secondary-market, cloud-provider, and GPU price-guide data. They are useful for intuition, not for procurement budgeting.

Back
···