Groq LPU

AI Inference Dedicated Chip (Language Processing Unit) G DevOps & Hardware

Basic Information

Company/Brand: Groq (acquired by NVIDIA in December 2025)
Country/Region: USA (Mountain View, California)
Official Website: https://groq.com/
Type: AI Inference Dedicated Chip (Language Processing Unit)
Founded: 2016 (by former Google TPU team)

Product Description

The Groq LPU (Language Processing Unit) is a novel chip architecture specifically designed for AI inference, completely abandoning the traditional GPU's use of HBM in favor of integrating hundreds of MB of SRAM directly onto the chip, enabling ultra-low latency inference. The LPU achieves an inference speed of 300 tokens/s on Llama 2 70B, which is 10 times faster than NVIDIA H100 clusters. NVIDIA acquired Groq for $20 billion in December 2025, and in March 2026, Groq launched the Groq 3 LPU with an on-chip bandwidth of 150 TB/s.

Core Features/Characteristics

Unique TSP (Tensor Streaming Processor) architecture
On-chip integration of hundreds of MB of SRAM as main memory (not cache)
80 TB/s on-chip memory bandwidth (first generation)
150 TB/s on-chip bandwidth (Groq 3, 2026)
Deterministic computation model with extremely low latency
10x higher energy efficiency compared to GPUs
First generation: 14nm process, 25×29mm, 900MHz
Second generation: Samsung 4nm process

Inference Performance

Llama 2 70B: 300 tokens/s (10x faster than NVIDIA H100)
Ultra-low latency: millisecond-level first token response
Approximately 10x higher energy efficiency compared to traditional GPUs

GroqCloud API Pricing

Llama 4 Scout: Input $0.11/M tokens, Output $0.34/M tokens
Llama 3 70B: Input $0.59/M tokens, Output $0.79/M tokens

Target Users

OpenClaw deployments requiring ultra-low latency AI inference
AI applications in real-time interactive scenarios
Enterprises with extreme demands for inference speed
Developers integrating cloud APIs

Competitive Advantages

Inference speed far exceeds traditional GPUs (10x faster than H100)
Deterministic latency with extremely stable response times
Significantly better energy efficiency compared to GPUs
Competitive pricing for GroqCloud API
Stronger resource support post-NVIDIA acquisition

Relationship with OpenClaw Ecosystem

The Groq LPU provides ultra-fast LLM inference services to OpenClaw through the GroqCloud API. OpenClaw can connect to GroqCloud via the API to achieve inference speeds far surpassing GPUs, significantly enhancing user interaction experiences. Groq's ultra-low latency is particularly suitable for OpenClaw's real-time conversation and task execution scenarios. GroqCloud's per-token billing model also makes costs manageable.

Categories

Top Skills

Topics A-I

Topics L-W

Popular Articles