Skip to main content

AMD at Hot Chips 2025: Deep Dive into CDNA 4 Architecture and MI350 Accelerators

·430 words·3 mins
AMD CDNA 4 MI350
Table of Contents

AMD MI350 Accelerator
AMD MI350 Accelerator

At Hot Chips 2025, AMD architects presented a deep dive into the CDNA 4 architecture powering the new MI350 accelerator family. Building on the MI300 foundation, MI350 introduces major architectural refinements and performance enhancements.

The AI Boom and Hardware Demands
#

Large Language Models: Explosive Growth
Large Language Models: Explosive Growth

Large Language Models (LLMs) continue to scale rapidly, requiring longer context lengths and greater memory capacity.

GenAI Needs
GenAI Needs

To sustain performance, hardware must deliver:

  • Higher memory bandwidth and capacity
  • Better energy efficiency
  • Scalable multi-GPU clustering for massive AI models

MI350 Series Launch
#

Instinct MI350 Series
Instinct MI350 Series

The MI350 family is now shipping, with two platform options:

  • MI350X → air-cooled
  • MI355X → liquid-cooled

Architectural Highlights
#

MI350 Architecture Enhancements
MI350 Architecture Enhancements

  • 185 billion transistors
  • Chiplet + 3D stacking design
  • 8 compute dies stacked across 2 I/O dies
  • Compute dies built on TSMC N3P 3nm
  • I/O dies remain on 6nm
  • Peak frequency: 2.4 GHz
  • Liquid-cooled TDP: 1.4 kW

MI350 GPU Chiplets
MI350 GPU Chiplets

Infinity Fabric upgraded to IF 4:

  • +2 TB/s bandwidth vs IF 3
  • Fewer cross-die links → wider, lower-frequency D2D connections → higher efficiency
  • 7 IF links per socket

MI350 GPU Cache & Hierarchy
MI350 GPU Cache & Hierarchy

Cache improvements:

  • LDS doubled compared to MI300
  • Each XCD has 4 MB L2 cache with coherence across dies

Data Formats and Compute Performance
#

Supported Data Formats
Supported Data Formats

CDNA 4 introduces:

  • New FP6 and FP4 formats
  • Nearly 2× throughput for key data types

Supported Data Formats Performance Comparison
Supported Data Formats Performance Comparison

→ AI math performance is now over 2× faster than competing accelerators.

System and Platform Design
#

Flexible GPU Partitioning
Flexible GPU Partitioning

  • Configurable as single NUMA domain or dual NUMA domains
  • XCDs can be partitioned into multiple logical GPUs

Infinity Platform
Infinity Platform

Connectivity:

  • Up to 8 GPUs in a fully connected topology via Infinity Fabric
  • PCIe connects GPUs to CPUs and NICs

Air Cooled UBB
Air Cooled UBB

OAM modules + universal baseboard (UBB):

  • Supports 8 GPUs per board
  • Air-cooled rack: up to 64 GPUs
  • Liquid-cooled rack: up to 96–128 GPUs

Software and Performance
#

ROCm 7
ROCm 7

The ROCm 7 software stack is maturing alongside hardware, improving overall performance.

Inference Performance
Inference Performance
GPU Training Performance
GPU Training Performance

Inference and training benchmarks show strong gains across workloads.

Roadmap Outlook
#

Annual Roadmap
Annual Roadmap

AMD reaffirmed its roadmap:

  • MI350 shipping now
  • MI400 arriving next year with up to 10× AI performance uplift

Instinct MI400
Instinct MI400

Conclusion
#

  • MI350/CDNA 4 continues the chiplet + 3D stacking strategy
  • Bandwidth, cache, and efficiency are significantly improved
  • AI data formats expanded (FP6, FP4), nearly doubling math throughput
  • Flexible system design: NUMA partitioning and large-scale GPU topologies
  • ROCm software keeps pace with hardware gains
  • Roadmap remains solid with MI400 on the horizon

Related

AMD Ryzen 5 5500X3D: First Benchmark Leak for Budget-Friendly X3D CPU
·547 words·3 mins
AMD Ryzen 5 5500X3D 3D V-Cache AM4 Gaming CPU
AMD Prepares to Launch Dual 3D V-Cache Ryzen 9000 Processors
·515 words·3 mins
AMD 3D V-Cache
AMD Raises MI350 Price by 70% to $25,000, Targeting AI Accelerator Leadership
·600 words·3 mins
AMD MI350 AI Accelerator CDNA 4 HBM3E Price Increase NVIDIA Blackwell Competitor