Common AI GPUs: A100, H100, B200, MI300X, Ascend, and More

I have been running into more and more GPU names lately, and I realized I did not have a stable mental model for how they relate to each other.

At first, A100 and H100 were enough to give me a rough sense of the landscape. Then H200, B200, B300, GB200, H20, A800, H800, MI300X, and Ascend 910B/910C started showing up in the same conversations. After a while, the names stopped being useful by themselves. Some are training GPUs, some are inference-oriented, some are mainly about memory, some are export-limited variants, and some are not single cards at all.

So this post is not a procurement guide or an attempt to build a complete spec sheet. It is a hardware map for myself. The goal is that when I hear these model names again, I can roughly place them by generation, strength, weakness, and ecosystem.

This is based on public information available around May 2026. Memory and bandwidth numbers are mostly vendor-published figures. Prices are rough ranges only, because data center GPUs rarely have a stable retail price. OEM systems, cloud rentals, secondary markets, regional supply, and export controls can move the number a lot.

Why these names are hard to compare

The confusion comes from three places.

First, NVIDIA's main line and restricted variants get mixed together. A100, H100, H200, and B200 are part of NVIDIA's data-center line. A800, H800, and H20 are shaped by export controls or regional market constraints. The names look close, but the limits can be very different.

Second, memory, bandwidth, and compute are different things. H200 is attractive mainly because of 141GB HBM3e and 4.8TB/s bandwidth, not because it is a new architecture after H100. H20 has good-looking memory, but much lower Tensor compute than H100. Sorting by model number is not enough.

Third, single GPU, server, and rack names are often used in the same sentence. B200 is a GPU. GB200 usually means a Grace CPU plus Blackwell GPU Superchip or platform. GB200 NVL72 is a 72-GPU rack-scale system. Comparing GB200 NVL72 with one H100 card is already crossing levels.

Building intuition for memory, bandwidth, and compute

The easiest concepts for me to mix up are memory capacity, memory bandwidth, and compute. They are all related to performance, but they answer different questions.

Memory capacity answers whether the workload fits. Model weights, KV cache, intermediate states in a batch, and in training the gradients and optimizer states all live in GPU memory. For inference, a rough first estimate is: BF16/FP16 weights use about 2 bytes per parameter, FP8/INT8 about 1 byte, and INT4/FP4 about 0.5 byte. A 70B model needs about 140GB just for BF16 weights. With 4-bit quantization, the weights are around 35GB, but KV cache, runtime buffers, and framework overhead still need room. Training is much more memory-hungry because weights are only one part of the footprint.

Memory bandwidth answers how much data can be moved from memory per second. This matters a lot for LLM inference, especially during decode, when the model generates one token at a time. Many inference workloads are not waiting because the GPU lacks math units; they are waiting because every token requires reading weights and KV cache from HBM. H200 is often faster than H100 in inference because it has 141GB HBM3e and 4.8TB/s bandwidth, not because it is a brand-new architecture. MI300X, MI325X, and MI350X follow the same basic logic: large HBM capacity and high bandwidth are central to their LLM-serving story.

Compute answers how fast the card can do matrix multiplication. Training, long-prompt prefill, and large multimodal blocks tend to lean harder on Tensor Core throughput. Here the datatype matters a lot. V100 and A100 are mostly FP16/BF16-era cards. H100 and H200 make FP8 important. B200 and B300 bring FP4 into the foreground. Vendor tables also often quote sparse numbers, while dense numbers are usually about half. Directly comparing A100 BF16, H100 FP8, and B200 FP4 numbers is a good way to confuse myself.

The rough checklist I now use is:

Question	Main thing to check	Intuition
Can the model run at all?	Memory capacity	Weights, KV cache, batch state, and training state need to fit
Is single-token generation fast?	Memory bandwidth	Decode is often waiting for data to move from HBM into compute units
Are training and prefill fast?	Tensor compute	Large matrix multiplications need enough Tensor Core throughput
Does it scale across GPUs?	Interconnect and software stack	NVLink, NVSwitch, InfiniBand, NCCL/RCCL/HCCL, and framework support matter

So the practical order is: check memory capacity first, to see whether the model and context fit; then bandwidth, especially for decode-heavy inference; then compute, for training, prefill, and large batches; then interconnect and ecosystem, to see whether the workload can scale across GPUs and machines. A good spec sheet is one thing; getting my actual training or inference stack to run is another.

NVIDIA's main line: V100 to B300

NVIDIA becomes much easier to understand if I pull out the main line first. V100 starts the Tensor Core era. A100 is the classic training card. H100 is the modern large-model training and inference baseline. H200 is more like a large-memory, high-bandwidth Hopper. B200 and B300 move into Blackwell and Blackwell Ultra.

Model	Time / architecture	Memory and bandwidth	Compute shorthand	Rough price / access	Intuition
V100	2017 Volta	16/32GB HBM2, about 900GB/s	FP16 Tensor about 125 TFLOPS	Used, about $1k-$5k	First Tensor Core training generation, now mostly historical context or cheap compute
A100 40GB	2020 Ampere	40GB HBM2, about 1.6TB/s	FP16/BF16 dense about 312 TFLOPS	About $6k-$15k	Classic training card, but 40GB is tight for modern LLMs
A100 80GB	2020/2021 Ampere	80GB HBM2e, about 1.9TB/s PCIe or 2.0TB/s SXM	FP16/BF16 dense about 312 TFLOPS, sparse about 624 TFLOPS	About $10k-$25k	Still common and dependable for training, fine-tuning, and inference
H100 PCIe/SXM	2022 Hopper	80GB HBM2e/HBM3, SXM about 3.35TB/s	FP8 dense about 2 PFLOPS, sparse about 4 PFLOPS	About $25k-$45k	Baseline modern large-model training and inference GPU
H100 NVL	2023 Hopper	94GB HBM3, about 3.9TB/s, often deployed in pairs	Compute close to H100	About $35k-$50k per-card class	PCIe-oriented LLM inference variant
H200	2024 Hopper	141GB HBM3e, 4.8TB/s	Compute close to H100/H100 NVL	About $35k-$55k+	Large-memory, high-bandwidth Hopper; strong for long-context inference
B200	2025 Blackwell	DGX/HGX commonly 180GB HBM3e, about 8TB/s	FP4 dense about 9 PFLOPS, FP8 dense about 4.5 PFLOPS	About $45k-$70k+, 8-GPU systems often above $500k	Main Blackwell flagship with a large low-precision jump
B300 / Blackwell Ultra	2025/2026 Blackwell Ultra	Up to about 288GB HBM3e, about 8TB/s	FP4 dense about 15 PFLOPS class	Quote-based, usually sold as systems or racks	Built for reasoning, long context, and large MoE workloads

H200 is the easy one to misunderstand. It is not a completely new architecture after H100. It is a Hopper-family card with much more memory and bandwidth. For long-context inference, large batches, and memory-bound workloads, that matters a lot. For compute-bound workloads, the jump may be less dramatic than the name suggests.

B200 and B300 are the newer generation story. Blackwell is not just about memory. It is also about FP4/FP8, NVLink 5, rack-scale system design, and throughput for reasoning-oriented inference. Those advantages show up most clearly in system-level deployments.

A800, H800, and H20 should not be sorted by name

China-market and export-control SKUs make the naming harder. A800, H800, and H20 are common, but they are not a normal upgrade path.

Model	Relationship	Memory and bandwidth	Main limit	Intuition
A800	A100 export-limited variant	40/80GB, bandwidth close to A100	Limited NVLink bandwidth	Similar to A100 in compute, weaker in scaling
H800	H100 export-limited variant	80GB HBM3, close to H100	Limited inter-GPU bandwidth	Fine for single-node or medium-scale use, worse than full H100 for very large training
H20	Restricted Hopper SKU	Commonly 96GB HBM3, about 4.0TB/s	Tensor compute far below H100	Good memory, inference-leaning, not a normal midpoint between H100 and H200

I keep this group in a separate bucket. Their common trait is not that they occupy a natural performance position, but that policy, channel availability, and regional constraints reshaped them. Names like L20, RTX 6000D, and B40 also appear in discussions, but their specs and channel naming are messier, so I am not expanding them here.

GB200 and GB300 are systems, not just GPUs

This is another place where the names blur. B200 and B300 are GPUs. GB200 and GB300 usually mean Grace CPU plus Blackwell GPU Superchip configurations or platforms. NVL72 is a 72-GPU rack-scale system.

Name	What it is	How to think about it
HGX H100/H200/B200/B300	4-GPU or 8-GPU baseboard platform for OEM servers	Common shape from cloud and server vendors
DGX H100/H200/B200/B300	NVIDIA first-party system	Usually an 8-GPU node plus validated system and software stack
GH200	Grace CPU + Hopper GPU Superchip	CPU and GPU connected tightly through NVLink-C2C, useful for HPC and large-memory workloads
GB200	Grace CPU + Blackwell GPU Superchip	GB200 NVL72 is a 72-GPU rack-scale platform, not a card
GB300	Grace + Blackwell Ultra	GB300 NVL72 targets reasoning and test-time scaling; power, cooling, and networking are system-level concerns
NVL72	A 72-GPU NVLink rack	Closer to a rack-scale AI supercomputer than a GPU card

When I see GB200, GB300, or NVL72, I first need to check whether the discussion is about a chip, a node, or a rack. Their value is the rack-scale memory pool, NVLink domain, networking, power delivery, and liquid-cooling design. They should not be compared one-to-one with a single H100.

Only the most common non-NVIDIA names

There are many non-NVIDIA accelerators, but for a first mental map I only need the names that show up most often in large-model infrastructure discussions.

Vendor / model	Memory and bandwidth	Rough peer	Main point
AMD MI300X	192GB HBM3, about 5.3TB/s	H100/H200	Large memory and bandwidth are the core selling points; often compared for LLM inference
AMD MI325X	256GB HBM3e, about 6TB/s	Between H200 and B200	More memory for long context and larger batches
AMD MI350X / MI355X	288GB HBM3e, about 8TB/s	B200/B300	FP4/FP6 support, high power, next-generation liquid-cooled data center focus
Huawei Ascend 910B	Public sources often cite 64GB HBM2e, about 1.2-1.6TB/s	Below A100/H100 depending workload	Common in China, but public specs are not fully transparent
Huawei Ascend 910C	Public estimates cite 96GB HBM2e, about 1.8-3.2TB/s	Partial H100 substitute	Sources disagree; adaptation and cluster engineering matter more than headline numbers
Intel Gaudi 3	128GB HBM2e, about 3.7TB/s	H100 alternative route	Ethernet-based interconnect and price-performance positioning; smaller ecosystem than CUDA
Google TPU v5p/v6e	Cloud Pod form factor	H100/H200/B200 cluster alternative	Strong JAX/XLA stack, clear platform lock-in
AWS Trainium2	Commonly 96GiB HBM, about 2.9TB/s	H100 cloud training alternative	Good price-performance potential, but depends on Neuron SDK

AMD's line is relatively easy to place: it challenges NVIDIA with larger memory, high bandwidth, and more room on price. ROCm is much better than it used to be, but CUDA remains smoother across kernels, frameworks, tools, and debugging. Whether MI300X or MI325X saves money depends on whether the exact model stack already runs well on ROCm.

Ascend needs caution. Public 910B/910C specs are less transparent than NVIDIA, AMD, or Intel specs, and different sources report different memory, bandwidth, and FP16 numbers. In real deployments, CANN, HCCL, MindSpore/PyTorch adaptation, kernel coverage, and cluster stability often matter more than single-card peak numbers.

Gaudi, TPU, and Trainium are more platform-shaped AI accelerators. They can be strong, but the comparison includes cloud platform, compiler stack, and ecosystem constraints, not just hardware.

My rough rankings

For large-model training:

B300/GB300 > B200/GB200 > H200 about H100 > H800 > A100/A800 > V100.

For memory friendliness:

MI350/MI355X and B300 > MI325X > MI300X and B200 > H200 > H20 and H100 NVL > H100/A100 80GB.

For China-market availability:

A800 and H800 are important historical export variants. H20 is the restricted Hopper inference card. Ascend 910B/910C is the domestic replacement line. Names like L20, RTX 6000D, and B40 have to be read together with the policy and channel context of the moment, not sorted by model number.

The three lines I will remember

One line is NVIDIA's data-center main line: V100 to A100, then H100/H200, then B200/B300. A100 is the classic era. H100 is the modern baseline. H200 is large-memory Hopper. B200 and B300 are Blackwell and Blackwell Ultra.

Another line is export-limited or restricted SKUs: A800, H800, H20. Do not sort them naturally by name. Check whether the limit is interconnect, compute, memory, or supply.

The third line is alternative ecosystems. AMD Instinct challenges NVIDIA mainly with memory capacity and bandwidth. Ascend matters in domestic Chinese replacement scenarios. Gaudi, TPU, and Trainium are more platform-shaped AI accelerators. Before staring at TFLOPS, ask whether the model fits, whether inference is memory-bound, whether training needs multi-GPU scaling, and whether the software stack actually runs.

Sources and caveats

The specification baseline comes mainly from NVIDIA's V100, A100, H100, H200, HGX, and GB300 pages and datasheets; AMD Instinct MI300 and MI350 pages; Intel's Gaudi 3 white paper; and public research or policy sources such as CSET for Huawei Ascend. Price ranges come from public reseller, secondary-market, cloud-provider, and GPU price-guide data. They are useful for intuition, not for procurement budgeting.