Thermocline | AI-Native Document Database with Automatic Tiered Storage

Why Thermocline?

An intelligent database built for AI workloads, cost efficiency, and operational simplicity

MongoDB Compatible

Connect with any MongoDB driver — same wire protocol, same MQL, same aggregation pipelines. Change your connection string and go.

Native Vector Search

Built-in HNSW and DiskANN indexes for semantic similarity. $vectorSearch aggregation stage with hybrid filtering. Store documents and embeddings together.

60-80% Cost Reduction

Automatically tier historical data to cost-optimized object storage while maintaining full query access. Pay hot prices only for hot data.

Raft Consensus

Automatic leader election with <5s failover, strong consistency, and zero split-brain. Built-in replication with no external coordination needed.

Time Travel

MVCC versioning lets you query data at any historical timestamp. Debug issues, audit changes, and recover from mistakes with point-in-time reads.

AI-Native by Design

Vector search, time travel, and document-level security designed for RAG pipelines and agentic workflows. Store documents and embeddings together in one database.

Features

Everything you need for a production-grade AI-native document database

Query Federation

Transparently split, route, and merge queries across the hot Storage Engine and cold object storage in a single operation

Policy-Driven Lifecycle

Define archival rules by age, size, or custom predicates. Data moves automatically from hot to cold with zero downtime

Complete MQL Translation

Every MongoDB query operator, aggregation stage, and cursor operation works seamlessly on cold data via Parquet translation

Multi-Cloud Storage

Native adapters for AWS S3, Google Cloud Storage, Azure Blob Storage, and any S3-compatible endpoint like MinIO

Wire Protocol Compatible

Full MongoDB 4.4+ wire protocol implementation. Connect with any MongoDB driver — no SDK, no library, no changes

Security & RBAC

Native SCRAM-SHA-256 auth, TLS 1.3, granular RBAC for vector search and time travel, encryption at rest, and audit logging

Full Observability

Prometheus metrics, structured JSON logging, OpenTelemetry distributed tracing, and pre-built Grafana dashboards

High Availability

Raft consensus replication with sub-5s failover. Multi-replica, multi-AZ deployments with no single point of failure

Kubernetes Native

Production-ready Helm charts, Horizontal Pod Autoscaling, resource quotas, and GitOps-compatible deployment patterns

Backup & Restore

WAL-based point-in-time recovery for hot and cold data. Automated scheduled backups with cross-region DR support

Intelligent Caching

Seven cache tiers including block cache, vector index cache, MVCC version cache, and WAL read cache with adaptive eviction

Parquet Storage Format

Columnar Parquet format with configurable compression (zstd, snappy, gzip) for optimal query performance and storage efficiency

Architecture

A standalone AI-native database with automatic hot/cold tiering

Community

Thermocline is SSPL licensed and community-driven

Contribute

Help build Thermocline. All contributions welcome — code, docs, and ideas.

Discussions

Ask questions, share ideas, and connect with the community.

Roadmap

See what we are building next and help shape the future of Thermocline.

The AI-Native DatabaseWith Automatic Tiered Storage