Why Cachee How It Works
All Verticals 5G Telecom Ad Tech AI Infrastructure Autonomous Driving DEX Protocols Fraud Detection Gaming IoT & Messaging MEV RPC Providers Streaming Trading Trading Infra Validators Zero-Knowledge
Pricing Documentation API Reference System Status Integrations
Blog Demo Start Free Trial
Platform Deep Dive

How Cachee Actually Works

A purpose-built AI caching layer that overlays your existing infrastructure. No migration required. Four steps from request to response, measured in nanoseconds, with AI predicting what your systems need before they ask.

1.21ns
L1 Cache Hit
827M
Operations / sec
95%+
L1 Hit Rate
<1hr
Deploy Time
Architecture

Request Lifecycle: Before vs After

Watch how a data request travels through your stack. Every hop adds latency you are paying for. Then see what happens when Cachee intercepts the chain.

👤
User Request
0ms
🌐
API Gateway
2.5ms
App Server
5ms
🔴
Redis Cache
12ms
Cache Miss → DB
25ms
Response
3ms
Total Request Latency
47.5ms
6 hops · 2 network round-trips · 1 database query
👤
User Request
0ms
🌐
API Gateway
0.5ms
Cachee L1
0.001ms
Response
0.3ms
🔴
Redis
skipped
Database
skipped
Total Request Latency
0.801ms
3 hops · 0 database queries · 95% served from L1 · 59x faster
👤
User Request
0ms
AI Pre-Fetched
0.001ms
Instant Response
~0ms
🌐
Gateway
bypassed
🔴
Redis
bypassed
Database
bypassed
Total Request Latency
0.001ms
Data pre-fetched by AI · Already in L1 before request arrives · 47,500x faster
Auto-cycling views
The Pipeline

Four Steps. Sub-Millisecond.

Every request that hits Cachee passes through a four-stage pipeline. Each stage is optimized for high-performance execution. The entire pipeline completes before most systems finish a single network hop.

01
AI Prediction
ML models analyze access patterns in real-time, predicting which data your application will request next. Models train continuously on your traffic, reaching 95%+ accuracy within hours.
~50ns prediction
02
Tiered Storage
Hot data lives in L1 (sub-microsecond). Warm data in L2 (single-digit microsecond). Cold data in L3 (sub-millisecond). AI manages promotion and eviction across all tiers automatically.
1.21ns L1 hit
03
Consistency Engine
Write-through invalidation with causal ordering ensures stale data is never served. 1.5µs propagation across all cache tiers. Distributed consistency protocol for distributed deployments.
<1µs propagation
04
Adaptive Tuning
The system continuously optimizes itself. Cache sizes, eviction policies, TTLs, and prefetch aggressiveness are all adjusted in real-time based on workload characteristics. Zero manual tuning required.
Continuous
Try It

See It Running Live

Deploy Cachee in your environment in minutes. Our CLI handles configuration, connection, and optimization automatically.

$ npm install -g @cachee/cli
$ cachee init --project my-app
Detecting infrastructure... connected ✓
Generating config... done
$ cachee deploy --watch
Deploying Cachee overlay...
L1 cache initialized (optimized)
AI model training started on live traffic
Status: ACTIVE | Hit rate: 87% (warming) | Latency: 3.2ns
Status: OPTIMIZED | Hit rate: 95%+ | Latency: sub-microsecond
Origin load reduced by 94.7% | Est. savings: $2,847/mo
Integration

Three Ways to Deploy

Don't rip out your Redis stack. Every integration model wraps your existing infrastructure and makes it dramatically faster — in under an hour.

Managed Cloud
Your app
→ SDK
Cachee
0.46ms
Your DB
3–12ms
Best for: new projects, teams with no infra ops
Sidecar Container
Your app
localhost
Sidecar
1.5µs
Your DB
3–12ms
Best for: latency-critical apps, Docker/K8s shops
Self-Hosted
Your app
→ Agent
Your infra
~0.01ms
Cachee AI
cloud ctrl
Best for: regulated industries, air-gapped, enterprise
What changes in your stack

Pick the model that fits how you deploy.

Every model wraps your existing infrastructure. Nothing gets ripped out.

Managed Cloud
BEFORE
Your App
Redis / DB
AFTER
Your App
Cachee SDK (new)
↓ L1 miss only
Redis / DB (unchanged)
Sidecar Container
BEFORE
Your App
↓ network hop
Redis (remote)
AFTER
Your App (no code change)
↓ localhost
Cachee Sidecar (new container)
↓ L1 miss only
Redis / DB (unchanged)
Self-Hosted
BEFORE
Your App
Your Cache Infra
AFTER
Your App (no code change)
Cachee Agent (on your infra)
↓ cache
↗ metrics only
Your Infra
Cachee AI
Choose your integration model
Recommended
Managed Cloud
Cachee provisions and runs your cache infrastructure. You get an API key. Point our SDK at it. Done.
< 1hr
from signup to first cache hit
  • Zero infrastructure to manage — we run it
  • SDK available for Node.js, Python, Go, and Java
  • Automatic scaling, patching, and failover
  • 99.99% SLA with global redundancy
  • Works alongside your existing Redis — no migration
Lowest Latency
Sidecar Container
Deploy a Cachee agent container alongside your app. Cache calls go over localhost — no network hop.
1.5µs
p99 cache hit latency
  • Redis-protocol compatible — change one line in your config
  • One Docker image, one env var — that's the full setup
  • Works in Docker Compose and Kubernetes pod specs
  • AI optimization runs in cloud; data never leaves your host
  • Required for Growth tier and above
Enterprise
Self-Hosted
Run the Cachee agent on your own bare-metal or VPC. Cachee's AI connects via the control plane — your data never leaves.
Air-gapped
data sovereignty, your rules
  • Full data isolation — cache data stays on your infra
  • Deploy in regulated environments — data never leaves your infra
  • Control plane handles AI decisions, you handle the hardware
  • Bring your own cloud account (AWS, GCP, Azure)
  • Dedicated Cachee solutions engineer during onboarding
Overlay
Overlay
Already using ElastiCache, CloudFlare KV, or Redis Cloud? Cachee sits in front as an L1 acceleration layer — zero code changes.
10-50x
faster cache hits vs origin provider
  • Works with ElastiCache, CloudFlare KV, Redis Cloud, Azure, GCP, Upstash
  • Swap one connection string — your app connects to Cachee instead
  • L1 serves hot data in ~10us, misses forwarded to your existing backend
  • Reduces API calls and costs to your underlying cache provider
  • No migration — your existing cache stays as the durable L2 backend

Managed Cloud — Setup Guide

Avg setup: 18 minutes

Node.js quickstart

bash
npm install @cachee/sdk
javascript
import { CacheeClient } from '@cachee/sdk'

const cache = new CacheeClient({
  apiKey: process.env.CACHEE_API_KEY,
  region: 'auto',        // nearest edge
  fallback: 'local',     // in-memory if offline
  timeout: 2000
})

// Set a key
await cache.set('user:1234', userData, { ttl: 300 })

// Get a key
const user = await cache.get('user:1234')

// Batch set
await cache.mset({ 'a': 1, 'b': 2, 'c': 3 })
python
from cachee import CacheeClient

cache = CacheeClient(api_key="your_key")

await cache.set("order:99", order_data, ttl=60)
result = await cache.get("order:99")

Steps to go live

1
Create your free account

Sign up at cachee.ai/start — no credit card required. Your account is active immediately.

2 min
2
Copy your API key

Your API key is generated on first login. Copy it to your environment variables as CACHEE_API_KEY.

1 min
3
Install the SDK

Run npm install @cachee/sdk or pip install cachee. Import the client and initialize it with your key.

5 min
4
Make your first cache call

Call cache.set() and cache.get() wherever you were calling Redis. The API is deliberately identical.

10 min
5
Watch the dashboard populate

Your hit rate, latency, and request volume appear in the dashboard within 60 seconds of your first call.

instant

Sidecar Container — Setup Guide

Avg setup: 12 minutes

Docker Compose

docker-compose.yml
services:
  your-app:
    image: your-app:latest
    environment:
      # Point your Redis client here instead:
      REDIS_HOST: cachee-sidecar
      REDIS_PORT: "6379"

  cachee-sidecar:
    image: cacheeai/sidecar:latest
    environment:
      CACHEE_API_KEY: ${CACHEE_API_KEY}
    # No ports exposed — localhost only
kubernetes (pod spec)
containers:
  - name: your-app
    image: your-app:latest
    env:
      - name: REDIS_HOST
        value: "localhost"
      - name: REDIS_PORT
        value: "6379"

  - name: cachee-sidecar
    image: cacheeai/sidecar:latest
    env:
      - name: CACHEE_API_KEY
        valueFrom:
          secretKeyRef:
            name: cachee-secrets
            key: api-key

How the sidecar works

1
Pull the image

docker pull cacheeai/sidecar:latest — it's under 50MB. No root, no surprise dependencies.

1 min
2
Add it to your compose or pod spec

Paste the 5-line snippet alongside your existing app container. Set your CACHEE_API_KEY env var.

3 min
3
Change one line in your app config

Point REDIS_HOST to cachee-sidecar (Compose) or localhost (Kubernetes). Your existing Redis client works without any code changes.

2 min
4
The sidecar handles everything else

On startup it authenticates to the Cachee control plane, downloads your configuration, and begins serving the Redis protocol on port 6379.

auto
5
Supported Redis commands

SET, GET, DEL, EXISTS, MGET, MSET, EXPIRE, TTL. Any unsupported command returns a clear error — never a silent hang.

Self-Hosted — Setup Guide

Avg setup: 45 min with dedicated engineer

Connect your infrastructure

bash — generate connection token
# From the Cachee dashboard → Self-Hosted → New Token
# Token expires in 24 hours, single-use

CACHEE_CONNECT_TOKEN="ct_live_xxxxxxxxxxxx"

# Run the agent on your infrastructure
docker run -d \
  -e CACHEE_CONNECT_TOKEN=$CACHEE_CONNECT_TOKEN \
  -e CACHEE_REGION="us-east-1" \
  -p 6379:6379 \
  cacheeai/agent:latest
verify connection
# Within 60 seconds the dashboard shows CONNECTED.
# The connect token is automatically invalidated.

# Test the connection:
redis-cli -h localhost -p 6379 PING
# → PONG (served by Cachee agent)

What stays where

1
Your data never leaves your infrastructure

All cache data — keys, values, TTLs — lives entirely on your hardware or VPC. Cachee has zero access to cache contents.

2
What the control plane does receive

Operational metrics only: hit rate, latency percentiles, memory utilization, request count. No keys, no values, ever.

3
AI optimization runs on telemetry, not data

Cachee's AI models analyze usage patterns from operational metrics and push eviction and pre-warming decisions back to your agent.

4
Works in air-gapped environments

The agent can operate in restricted-egress networks. Configure a proxy for control plane sync if direct outbound is not allowed.

5
Dedicated onboarding engineer included

All self-hosted accounts are assigned a Cachee solutions engineer who runs the first deployment with your team live.

Overlay — Setup Guide

Avg setup: 10 minutes

Deploy Cachee in front of your existing cache

docker-compose.yml
services:
  your-app:
    image: your-app:latest
    environment:
      # Point at Cachee instead of your old cache:
      REDIS_HOST: cachee-overlay
      REDIS_PORT: "6379"

  cachee-overlay:
    image: cacheeai/proxy:latest
    environment:
      CACHEE_API_KEY: ${CACHEE_API_KEY}
      # Your existing cache becomes the L2 backend:
      UPSTREAM: ${YOUR_EXISTING_CACHE_ENDPOINT}
    ports:
      - "6379:6379"
supported backends
# ElastiCache / Redis Cloud / Azure / GCP / Upstash:
UPSTREAM=redis://your-elasticache.abc.cache.amazonaws.com:6379

# CloudFlare Workers KV (HTTP adapter):
UPSTREAM=cloudflare://ACCOUNT_ID/NAMESPACE_ID
UPSTREAM_CF_TOKEN=${CF_API_TOKEN}

How the overlay works

1
Deploy the Cachee proxy

One container, one env var for your API key, one for your existing cache endpoint. The proxy speaks Redis protocol on port 6379.

3 min
2
Swap one connection string

Change your app's REDIS_HOST from your existing cache to the Cachee proxy. Zero code changes needed — your existing Redis client works as-is.

2 min
3
Hot data served from L1 in ~10us

Cachee's in-memory L1 (Tiny-Cachee engine) serves frequently accessed keys without touching your backend. Cache misses are forwarded transparently.

4
Reduce costs and API calls

With 90%+ hit rates on the L1, you cut 90% of calls to your existing provider — lowering both latency and cost.

5
No migration required

Your existing cache keeps all its data. Cachee only accelerates reads — writes pass through to your backend for durability.

Which model is right for your team?

Side-by-side comparison of all four deployment models across the metrics that matter.

Capability Managed Cloud Sidecar Overlay Self-Hosted
Setup time< 1 hour12 minutes10 minutes45 minutes
p99 cache hit latency0.46ms1.5µs~0.01ms~0.01ms
Infrastructure to manage None — we run it One container One container Your hardware
Existing Redis client works SDK change Zero code change Zero code change Zero code change
Keep existing cache provider Your L2 backend
Reduces provider API costs Up to 90%
AI pre-warming & optimization
Automatic scaling Fully managed Managed Managed Controlled
Multi-region failover
Dedicated solutions engineer Included
Available on tierStarter +Growth +Starter +Enterprise

Already using a cache? Overlay it. Starting fresh? Go Managed.

Every deployment model uses the same control plane. Start with Overlay to accelerate your existing ElastiCache or CloudFlare KV, or go Managed for a turnkey solution.

Start Free Trial Try It Free
Capabilities

Platform Capabilities

Every feature is designed for production workloads at scale. No toy benchmarks. No asterisks. These are the capabilities running in production today.

Native Engine
High-performance data paths. No garbage collection pauses. No runtime overhead. The entire hot path runs in CPU cache lines, delivering consistent nanosecond latency under load.
1.21ns average L1 hit latency
AI Prediction Engine
Proprietary ML models trained on your access patterns. Predicts next-access with 95%+ accuracy. Models update every 30 seconds without downtime. Custom per-tenant model isolation.
95%+ hit rate in production
3-Tier Storage
L1 (sub-microsecond), L2 (single-digit microsecond), L3 (sub-millisecond). AI manages data placement across tiers. Hot data automatically promoted, cold data evicted. No manual tuning.
128x storage reduction vs raw
Overlay Architecture
Deploys alongside your existing Redis, Memcached, or database. No migration. No code changes. Cachee optimizes requests transparently and serves from L1 when possible.
Zero code changes required
Multi-Region Sync
Distributed consistency protocol across regions. Sub-millisecond local reads with automatic conflict resolution. Causal ordering guarantees prevent stale reads after writes.
Global consistency in <5ms
Enterprise Security
AES-256 encryption at rest and in transit. Role-based access. Audit logging. Tenant isolation with zero data leakage. Self-hosted option keeps all cache data on your infrastructure.
Encryption + tenant isolation
Comparison

How Cachee Compares

Side-by-side with the caching solutions you already know. Same metrics, same workloads, independently verifiable.

Metric Redis Memcached CloudFront Cachee
Read Latency (p50) 0.8 - 2ms 0.5 - 1ms 5 - 50ms 1.21ns
Read Latency (p99) 5 - 15ms 3 - 8ms 50 - 200ms 12ns
Throughput 500K ops/s 1M ops/s N/A (CDN) 827M ops/s
AI Prediction None None None 95%+ accuracy
Auto-Tuning Manual TTLs Manual config Basic TTLs Fully autonomous
Network Hops 2-3 hops 2-3 hops 1-4 hops Near-zero
GC Pauses Rare (C) None (C) Varies None
Origin Load Reduction 60 - 80% 60 - 75% 40 - 70% 95%+
Deploy Complexity Moderate Moderate Low (CDN) 1 command overlay
ROI Calculator

Calculate Your Savings

Input your current infrastructure metrics. See exactly what changes when Cachee deploys. All calculations use conservative estimates based on production deployments.

100M
$85,000
47ms
65%
$0
Monthly Savings
$0 annually
0ms
New Avg Latency
0x faster
0x
ROI Multiplier
Based on Scale tier ($500/mo)
Benchmarks

Production Benchmarks

These numbers come from production deployments, not synthetic benchmarks. Measured on real infrastructure under real workloads. All benchmarks are independently reproducible.

L1 Cache Read Latency
0
nanoseconds (p50)
Redis: 800,000ns Cachee: 1.21ns
Operations Per Second
0
million ops/sec (single node)
Redis: 0.5M ops/s Cachee: 827M ops/s
L1 Hit Rate (After Training)
0
percent (production average)
Redis: ~65% typical Cachee: 95%+
Origin Load Reduction
0
percent fewer database queries
Redis: ~70% reduction Cachee: 95%+ reduction
Infrastructure Economics

Four Metrics Shift the Moment You Deploy

Memory utilization rises because Cachee is actively using it. Everything else -- server hits, infrastructure cost, response latency -- drops dramatically.

▲ GOES UP
0%
Memory Utilization
Cachee actively uses L1 memory to store predicted data. Higher utilization = more cache hits = fewer expensive backend calls.
▼ GOES DOWN
0%
Database / Origin Hits
95%+ of requests served from L1 memory. Your database goes from handling millions of queries to handling thousands.
▼ GOES DOWN
0%
Infrastructure Spend
Fewer database replicas, smaller Redis clusters, less compute. Enterprises typically see 40-70% infrastructure cost reduction.
▲ GOES UP
0x
Request Performance
P99 latency drops from tens of milliseconds to sub-millisecond. Same hardware handles orders of magnitude more throughput.

P&L Impact (100M requests/month)

Representative enterprise running on a standard AWS stack. These are the line items that change when Cachee deploys.

Line ItemBefore CacheeAfter CacheeDelta
ElastiCache / Redis Cluster$18,000/mo$4,500/mo−$13,500
RDS / Aurora Database$32,000/mo$12,000/mo−$20,000
Compute (EC2 / ECS / Lambda)$24,000/mo$10,000/mo−$14,000
Data Transfer / CDN$11,000/mo$4,500/mo−$6,500
DevOps Hours (cache mgmt)60 hrs/mo ($12,000)4 hrs/mo ($800)−$11,200
Cachee Platform Cost$500/mo+$500
NET MONTHLY IMPACT$97,000/mo$32,300/mo−$64,700/mo
$776,400 annual savings · 129x ROI on Scale tier

Representative figures based on typical enterprise deployment. Actual results vary by infrastructure configuration, workload patterns, and scale.

Ready to See the Difference?

Deploy Cachee in under an hour. No migration. No downtime. The data your systems need is already waiting in L1 memory before they ask for it.

1.21 nanoseconds — that's the new standard.

cachee.ai