Groq LPU

AI Inference Dedicated Chip (Language Processing Unit) G DevOps & Hardware

Basic Information

  • Company/Brand: Groq (acquired by NVIDIA in December 2025)
  • Country/Region: USA (Mountain View, California)
  • Official Website: https://groq.com/
  • Type: AI Inference Dedicated Chip (Language Processing Unit)
  • Founded: 2016 (by former Google TPU team)

Product Description

The Groq LPU (Language Processing Unit) is a novel chip architecture specifically designed for AI inference, completely abandoning the traditional GPU's use of HBM in favor of integrating hundreds of MB of SRAM directly onto the chip, enabling ultra-low latency inference. The LPU achieves an inference speed of 300 tokens/s on Llama 2 70B, which is 10 times faster than NVIDIA H100 clusters. NVIDIA acquired Groq for $20 billion in December 2025, and in March 2026, Groq launched the Groq 3 LPU with an on-chip bandwidth of 150 TB/s.

Core Features/Characteristics

  • Unique TSP (Tensor Streaming Processor) architecture
  • On-chip integration of hundreds of MB of SRAM as main memory (not cache)
  • 80 TB/s on-chip memory bandwidth (first generation)
  • 150 TB/s on-chip bandwidth (Groq 3, 2026)
  • Deterministic computation model with extremely low latency
  • 10x higher energy efficiency compared to GPUs
  • First generation: 14nm process, 25×29mm, 900MHz
  • Second generation: Samsung 4nm process

Inference Performance

  • Llama 2 70B: 300 tokens/s (10x faster than NVIDIA H100)
  • Ultra-low latency: millisecond-level first token response
  • Approximately 10x higher energy efficiency compared to traditional GPUs

GroqCloud API Pricing

  • Llama 4 Scout: Input $0.11/M tokens, Output $0.34/M tokens
  • Llama 3 70B: Input $0.59/M tokens, Output $0.79/M tokens

Target Users

  • OpenClaw deployments requiring ultra-low latency AI inference
  • AI applications in real-time interactive scenarios
  • Enterprises with extreme demands for inference speed
  • Developers integrating cloud APIs

Competitive Advantages

  • Inference speed far exceeds traditional GPUs (10x faster than H100)
  • Deterministic latency with extremely stable response times
  • Significantly better energy efficiency compared to GPUs
  • Competitive pricing for GroqCloud API
  • Stronger resource support post-NVIDIA acquisition

Relationship with OpenClaw Ecosystem

The Groq LPU provides ultra-fast LLM inference services to OpenClaw through the GroqCloud API. OpenClaw can connect to GroqCloud via the API to achieve inference speeds far surpassing GPUs, significantly enhancing user interaction experiences. Groq's ultra-low latency is particularly suitable for OpenClaw's real-time conversation and task execution scenarios. GroqCloud's per-token billing model also makes costs manageable.

Information Sources