Skip to main content

Scaling Meteor.js with Thermocline: Oplog Tailing Without the Replica Set

· 5 min read

Meteor.js has two ways to detect database changes and push them to connected clients: polling and oplog tailing. Polling works out of the box. Oplog tailing performs better but requires a MongoDB replica set. Both have tradeoffs that get worse as your data grows.

Thermocline solves this differently. Its WAL-based change stream architecture gives Meteor applications oplog-compatible reactivity with none of the operational overhead, and adds automatic hot/cold tiering that lets large collections move to object storage without breaking live queries.

How Meteor Reactivity Works

Every Meteor.subscribe() call creates a live query on the server. The server watches for changes and pushes updates to the client over DDP (Distributed Data Protocol). The question is how the server detects those changes.

Poll-and-Diff

The PollingObserveDriver re-runs your query on an interval, compares the new results against the previous results, and sends the diff. It works with any MongoDB setup, including standalone instances.

The problems show up at scale:

  • CPU cost grows with result set size. Every poll cycle diffs the entire result set, even if nothing changed. A query returning 10,000 documents diffs 10,000 documents every cycle.
  • Latency depends on poll interval. Changes from external processes take up to 10 seconds to appear. Changes from the same Meteor process are faster, but you still pay the diff cost.
  • Write-heavy workloads amplify the problem. More writes mean more diffs that find changes, which triggers more client updates.

Oplog Tailing

The OplogObserveDriver reads MongoDB's replication log (local.oplog.rs) to see every write operation as it happens. Instead of re-running queries, it evaluates each oplog entry against active subscriptions and pushes only the relevant changes.

This is faster and more efficient than polling, but it comes with its own costs:

  • Requires a replica set. A standalone mongod has no oplog. You need at least a single-node replica set, which adds operational complexity.
  • Reads the entire oplog stream. Even if you only care about one collection, Meteor processes every oplog entry across all collections. High write rates on unrelated collections consume CPU on your Meteor server.
  • Batch operations cause CPU spikes. A bulk insert of 50,000 documents generates 50,000 oplog entries that Meteor must evaluate against every active subscription. This is a well-documented scaling bottleneck.
  • No collection-level filtering until recently. Meteor 2.16 added the ability to include or exclude specific collections from oplog tailing, but this is opt-in and requires manual configuration.

How Thermocline Handles This

Thermocline speaks the MongoDB wire protocol. Meteor applications connect with a standard MongoDB connection string. No driver changes, no package swaps.

Under the hood, Thermocline detects Meteor clients during the connection handshake by inspecting the application.name and driver.name fields in client metadata. When a Meteor client is detected, the gateway activates oplog emulation.

WAL-Based Oplog Emulation

Thermocline does not have a MongoDB-style replication oplog. Instead, it has a write-ahead log (WAL) that records every mutation for durability and crash recovery. The gateway converts WAL change events into oplog-formatted documents that Meteor expects:

WAL Change EventOplog Entry
operationType: "insert"op: "i", o: <fullDocument>
operationType: "update"op: "u", o: <updateSpec>, o2: <documentKey>
operationType: "delete"op: "d", o: <documentKey>

The gateway virtualizes the local database entirely. When Meteor queries local.oplog.rs with a tailable cursor, the gateway creates an oplog cursor backed by the WAL change stream. Meteor sees exactly what it expects: a capped collection of oplog entries with ts, op, ns, o, and o2 fields.

This means:

  • No replica set required. A single Thermocline node provides oplog tailing. No replica set configuration, no election overhead.
  • Per-cursor filtering. Each oplog cursor only receives events matching its namespace filter. Writes to unrelated collections never reach Meteor's subscription evaluation.
  • No full-oplog scan. Meteor's oplog driver normally reads the entire replication stream. Thermocline's per-cursor delivery eliminates this bottleneck entirely.

Practical Scaling Difference

Consider a Meteor application with 200 active subscriptions across 15 collections. With MongoDB oplog tailing, a bulk import into a logging collection generates thousands of oplog entries that every subscription must evaluate and discard.

With Thermocline, those oplog entries are only delivered to cursors whose namespace filter matches the logging collection. The other 14 collections' subscriptions see zero additional load.

Automatic Tiered Storage for Meteor

Meteor applications tend to accumulate data. Chat histories, activity logs, analytics events, audit trails. These collections grow indefinitely, and MongoDB keeps all of it on primary storage.

Thermocline tiers data automatically. You define a policy (age-based, size-based, or both), and data moves from hot storage (NVMe/SST) to cold storage (Parquet on S3, GCS, or Azure Blob). The key property: queries still work across both tiers transparently.

A Meteor publication that queries Messages.find({roomId: "abc123"}) returns results from both hot and cold storage. Recent messages come from the LSM-tree. Older messages come from Parquet files on object storage. The application code does not change.

For applications with large collections where only recent data is actively accessed, this reduces storage costs by 60-80% while keeping the full dataset queryable.

What This Looks Like

# Start Thermocline
docker run -d --name thermocline -p 27017:27017 stronglyai/thermocline

# Point your Meteor app at it
MONGO_URL=mongodb://localhost:27017/meteor meteor run

Meteor detects the oplog automatically. No MONGO_OPLOG_URL environment variable needed. No replica set initialization.

Summary

MongoDB PollingMongoDB OplogThermocline
Replica set requiredNoYesNo
CPU cost per poll cycleO(result set)N/AN/A
CPU cost per writeLowO(active subscriptions)O(matching subscriptions)
Bulk write impactNext poll cycleFull oplog scanFiltered delivery
Cold storage tieringNoNoAutomatic
Connection changeN/AN/AConnection string only

Thermocline is open source under SSPL v1. The full implementation of oplog emulation lives in services/gateway/src/proxy/handler/oplog.rs.

Get started or read the documentation.