Compatible with Generation 4 of the PCIe interface, the NVIDIA A100 is an Ampere generation GPU that is easy to integrate into existing servers.
The NVIDIA A100 is built on 7nm technology with 40GB of Samsung HBM2. Best acceleration for both existing FP32/ FP64 workloads as well as acceleration for new AI workloads is achieved. The NVIDIA A100 PCIe Gen4 add-in card is rated for up to 250W operation.
Product Data
Specification: NVIDIA A100 Tensor Core GPU
Technical White Paper: NVIDIA A100 Tensor Core GPU Architecture. Unprecedented acceleration at every scale
NVIDIA A100 for NVIDIA HGX™ | NVIDIA A100 for PCIe | |
GPU Architecture | NVIDIA Ampere | |
Double-Precision Performance | FP64: 9.7 TFLOPS FP64 Tensor Core: 19.5 TFLOPS |
|
Single-Precision Performance | FP32: 19.5 TFLOPS Tensor Float 32 (TF32): 156 TFLOPS | 312 TFLOPS* |
|
Half-Precision Performance | 312 TFLOPS | 624 TFLOPS* | |
Bfloat16 | 312 TFLOPS | 624 TFLOPS* | |
Integer Performance | INT8: 624 TOPS | 1,248 TOPS* INT4: 1,248 TOPS | 2,496 TOPS* |
|
GPU Memory | 40 GB HBM2 | |
Memory Bandwidth | 1.6 TB/sec | |
Error-Correcting Code | Yes | |
Interconnect Interface | PCIe Gen4: 64 GB/sec Third generation NVIDIA® NVLink®: 600 GB/sec** | PCIe Gen4: 64 GB/sec Third generation NVIDIA® NVLink®: 600 GB/sec** |
Form Factor | 4/8 SXM GPUs in NVIDIA HGX™ A100 | PCIe |
Multi-Instance GPU (MIG) | Up to 7 GPU instances | |
Max Power Consumption | 400 W | 250 W |
Delivered Performance for Top Apps | 100% | 90% |
Thermal Solution | Passive | |
Compute APIs | CUDA®, DirectCompute, OpenCL™, OpenACC® | |
* Structural sparsity enabled ** SXM GPUs via HGX A100 server boards; PCIe GPUs via NVLinkBridge for up to 2 GPUs |