SambaNova Technology in Europe

Infercom is Europe's first and only public provider of SambaNova's dataflow architecture. Purpose-built AI inference, hosted in Germany, fully GDPR compliant.

Up to 10x

Faster Inference

Up to 5x

Energy Efficient

100%

EU Hosted

Dataflow vs. GPU Architecture

Why purpose-built dataflow beats general-purpose GPUs for AI inference

Dataflow Architecture

Purpose-built for AI

Purpose-built for AI workloads, creating custom processing pipelines for entire computation graphs while minimizing data movement.

✓ Entire model resident in memory
✓ Data flows through operations without intermediate writes
✓ Operator fusion: hundreds of operations in single kernel
✓ Software-defined hardware optimizes for each workload

Traditional GPU

General-purpose design

General-purpose design requiring kernel-by-kernel execution creates bottlenecks for AI inference workloads.

✗ Kernel-by-kernel execution creates overhead
✗ Excessive data movement between processor and memory
✗ Memory bandwidth bottleneck limits performance
✗ Underutilization of compute resources

Deep dive: How dataflow solves the AI inference crisis

The SambaNova Advantage

GPUs weren't built for AI. They were designed for video game graphics, and the architecture shows. SambaNova took a different path, designing chips purpose-made for AI inference from the ground up. Independent benchmarks show up to 10x faster inference, making it the fastest LLM inference technology available. Here's how it works.

GPUs Spend Most of Their Time Waiting

When a GPU generates AI responses, it follows a repetitive cycle: fetch data from memory, compute, write back to memory, repeat. The round trip to memory takes far longer than the actual computation. GPUs spend most of their time waiting for data, not processing it.

Engineers call this "memory-bound." The processor idles while data shuffles back and forth. For AI inference, where every token requires this fetch-compute-write cycle, the delays stack up. A 1,000-token response means 1,000 memory round trips.

GPU manufacturers have tried faster memory and bigger caches. The core problem remains: data keeps bouncing between processor and memory.

Dataflow Streams Data Through the Chip Instead of Shuffling It

SambaNova's Reconfigurable Dataflow Units (RDUs) work differently. Instead of fetching and storing data repeatedly, they lay operations out spatially across the chip and stream data through them continuously. Data moves in one direction through the computation, like parts on an assembly line, rather than being loaded and unloaded at each step.

The entire AI model stays resident in memory. Data flows through operations without intermediate writes. Operations that would require separate steps on a GPU get fused together. SambaNova calls this "execution streaming continuously across the processor." It's the opposite of kernel-by-kernel GPU execution.

Less waiting. More throughput.

A Three-Tier Memory System Lets One Rack Run 600B+ Parameter Models

SambaNova uses a three-tier memory hierarchy that balances speed and capacity. At the fastest level, 520 megabytes of SRAM sits directly on each chip, handling the hottest data and enabling operations to fuse together without memory trips. Below that, 64 gigabytes of high-bandwidth memory (HBM) holds model weights and active data. Unlike traditional caches, this layer is software-controlled, meaning the system decides exactly what lives here rather than relying on automatic eviction policies.

The third tier provides up to 1.5 terabytes of DDR memory per chip for prompt caching and hosting multiple models simultaneously. Each chip can address memory across all chips in the rack, creating a massive shared memory pool that operates as a flat address space.

A single rack can run models with hundreds of billions of parameters. Competitors using pure SRAM architectures need thousands of chips to achieve the same capability. The three-tier approach trades a small amount of peak speed for better capacity and flexibility.

Up to 10x Faster Than GPU Inference, 5x Better Energy Efficiency

Independent benchmarks from Artificial Analysis confirm the performance difference: SambaNova's dataflow architecture delivers up to 10x faster inference than GPU alternatives, with up to 5x better energy efficiency. A single rack consumes around 10 to 15 kilowatts versus 40 to 50 kilowatts or more for equivalent GPU infrastructure. It runs on standard air cooling, requiring no exotic liquid cooling systems or purpose-built datacenters.

For real-time applications, this shows up in the user experience. An AI assistant that responds in 200 milliseconds instead of 2 seconds is a different product. You use it constantly instead of waiting for it. Faster tokens also mean more reasoning steps within the same time budget, which matters for agentic workflows and complex multi-step tasks.

SambaNova Technology in Europe: Infercom is the First and Only Public Provider

All of this performance is available through SambaNova's cloud. But that cloud runs in the United States. For European businesses with GDPR compliance requirements, data sovereignty mandates, or a preference to keep sensitive data within EU jurisdiction, that creates a barrier.

Infercom brings SambaNova to Europe. We operate the first and only public SambaNova cloud in the EU, hosted in Munich, Germany. Latency from Frankfurt is under 10 milliseconds. With speeds exceeding 400 tokens per second on models like MiniMax M2.7 Ultraspeed, it's the fastest LLM API in Europe — with complete EU data sovereignty. Your prompts and responses never leave European soil. No CLOUD Act exposure. No third-country data transfers. Fully GDPR compliant.

If you're building AI applications that need both speed and sovereignty, this is the infrastructure.

Try it free Run your own benchmark

Want to go deeper?

Further Reading: Technical Papers & Benchmarks

SN40L Reconfigurable Dataflow Unit

Built on TSMC's 5nm process with 1,040 compute cores per chip, delivering 638 BF16 TFLOPS per chip - 10.2 PetaFLOPS per rack.

SambaNova SN40L RDU - dual-die CoWoS package on TSMC 5nm

638

TFLOPS/Chip (BF16)

1,040

PCUs/Chip

10.2

PFLOPS/Rack

TSMC 5nm

Process

Three-Tier Memory per Chip

520MB

SRAM

Ultra-fast on-chip cache

64GB

HBM

High-bandwidth co-packaged memory

Up to 1.5TB

DDR

Off-package DIMM storage

Rack Configuration (SN40L-16)

RDU Chips

Up to 25TB

Total Memory

~10kW

Typical Power

Air

Cooled

World-Record Performance

Performance measured in output tokens per second. Benchmarks from Infercom EU infrastructure in Munich, Germany.

MiniMax M2.5

NEW - EU Hosted

404tokens/sec

High-performance multimodal model now hosted on Infercom's EU infrastructure. Independently measured at 400+ tokens/sec by Artificial Analysis.

EU Sovereign

gpt-oss-120bFastest EU Model

772tokens/sec

OpenAI's open-source 120B parameter model. Exceptional throughput for high-volume sovereign workloads.

EU Sovereign

View independent benchmarks on Artificial Analysis See our own measured results from EU infrastructure

Sustainable AI Infrastructure

Up to 5x better energy efficiency than GPU-based inference

Lower Power Consumption

Typical 10kW per rack versus GPU racks consuming 40–50kW+ for equivalent workloads. Dataflow architecture requires fewer chips, translating to dramatic power savings.

Smaller Footprint

Dramatically reduced physical space, simplified cooling, and lower total infrastructure costs.

Air-Cooled Design

No liquid cooling infrastructure required. Standard air cooling simplifies deployment, reduces maintenance complexity, and lowers operational overhead.

"Not all tokens are created equal. The real value lies not in measuring tokens generated, but in the quality of intelligence delivered per unit of energy consumed."

SambaNova - "Intelligence per Joule"

Advanced Model Capabilities

Massive Model Support

Run large models with hundreds of billions of parameters on a single rack. Support for Composition of Experts (CoE) with 100+ expert models hosted simultaneously.

100s of B paramsCoE support100+ models

Long Context Windows

Handle up to 192K token context windows (MiniMax M2.7) and 128K on most other models. Massive memory capacity enables document analysis, code generation, and reasoning tasks.

Up to 192K tokensSingle nodeNo truncation

Millisecond Model Switching

Multiple models resident in memory simultaneously with millisecond switching latency - orders of magnitude faster than GPU systems. Perfect for agentic AI and multi-model workflows.

ms switchingMulti-modelAgentic AI

SambaNova Infrastructure in Europe

Infercom operates Europe's first public SambaNova cloud. Hosted in a Tier III+ certified datacenter in Munich, Germany — 100% EU jurisdiction.

Row of SambaNova racks powering Infercom's EU infrastructure

SambaNova rack infrastructure - air-cooled, 10kW per rack

Munich-Based Hosting

Europe's only public SambaNova deployment. All data and processing remains within German borders under EU jurisdiction.

No US Jurisdiction

Full GDPR compliance. Protection from CLOUD Act, PATRIOT Act, and foreign intelligence access. Your data never leaves Europe.

AI Act Ready

Infrastructure prepared for EU AI Act requirements and compliance.

Tier III+ Certified

99.982% uptime guarantee with redundant power and cooling systems.

Learn More About the Technology

Explore in-depth resources from SambaNova and independent sources

SambaNova Blog