Acorn1010 avatar Acorn1010
Back to blog

How to Migrate Off Firestore to Postgres in 2 Days

How I migrated Foony off Firestore to Postgres + Redis pub/sub via Change Data Capture in just 2 days, dropping Firebase costs from $550/month to $40/month and trimming ~100KB off the client bundle.

engineeringfirestorefirebasepostgresredispubsubcdc

How to Migrate Off Firestore to Postgres in 2 Days

Firebase. That one word can easily cost you 10-20x what a cheaper database costs, and can both constrain and complicate your database design.

So you can imagine how relieved I was when I finished migrating off Firestore in just 2 days. Well, 3 days if you count the time I spent building my initial “PostgresSynchronizer”. Or 3.5 days if you count the time I spent migrating the workshop system. In any case, it was fast. Real fast.

My initial estimate for this project was an entire month of work! But, thanks to some awesome LLM prompts, and a friend who taught me about CDC, I was able to get through this significantly faster.

If you’re looking for pub/sub, Firebase or Supabase might sound very tempting. After all, not many databases support pub/sub out of the box. But this simple architecture below is all you need for most small-business use cases (and should scale nicely up to about 10k-100k ops / second).

Foony’s pub/sub architecturePostgres runs on its own Hetzner server. The CDC and Gateway services run inside a Hetzner k3s cluster. CDC reads Postgres’ Write-Ahead Log and publishes changes to Redis Pub/Sub (a separate Hetzner server). The Gateway subscribes to Redis and fans out updates to WebSocket clients.Postgres server (Hetzner)PostgresWALk3s cluster (Hetzner)CDC(singleton)Gateway(autoscaling)Redis server (Hetzner)Redis Pub/SubWebSocket clientWebSocket clientWebSocket client

Postgres WAL (Hetzner) → CDC (k3s) → Redis Pub/Sub (Hetzner) → Gateway (k3s) → WebSocket clients

Before we get into the details of how the pub/sub works, though, I’m gonna go over the steps I took in order.

First, I made a prompt in Cursor to build a PostgresSynchronizer. This class has a function, handleWrite, that takes a Firestore path, the type of operation (create, update, delete), and the data. It then upserts the data to Postgres. I then wrapped my Firestore store instance so that any changes to Firestore would call handleWrite as well.

This isn’t 100% perfect, of course—if the server disconnects or crashes, it’s possible that a few minor changes aren’t persisted. But it’s good enough for Foony’s use case, and the Postgres schema gives us better data integrity guarantees (e.g. foreign key constraints) than Firestore.

Next, I created and ran a backfill that iterates over all the Firestore collections and calls handleWrite for each document. This ensures we get all historical data from Firestore. For speed, I limited this to only registered users (sorry guests—you should’ve signed up). This works well, and it’s safe to run multiple times.

Now that I had a way to populate the data in Postgres, and keep it (mostly) in sync with Firestore, I was able to tackle the big, scary problem: pub/sub.

How the hell do you do pub/sub with Postgres? Or any SQL database for that matter?

Postgres Change Data Capture (CDC) to the rescue!

CDC is a fancy word that just means “read the changes made to the database and publish them to a pub/sub system”. In a big business, you might use something like Debezium + Kafka. But Kafka is a pain to set up, we already have Redis in place, and our database only gets about 30 ops / second. Even if we were to 100x our scale to ~100,000 concurrent users, we can likely still fit this in a single Redis instance. So that’s what I did.

(You can also bolt pub/sub onto Postgres with LISTEN/NOTIFY, but it doesn’t survive reconnects and falls over once you actually need fanout. CDC is the boring, durable answer.)

My friend, Eric, taught me about CDC and open-sourced his CDC code that reads Postgres’ WAL (Write-Ahead Log—it’s where Postgres writes every change it makes to the database) via a logical replication slot, and persists it to a sink (e.g. Redis).

With a simple, detailed prompt in Cursor, I was able to mostly one-shot the realtime CDC and gateway code (with some minor tweaks and a thorough code review). This includes two very simple services:

  • a singleton CDC service that reads the WAL and publishes it to Redis Pub/Sub (can get more durability with Streams, but at the cost of complexity)
  • a horizontally autoscaling gateway service that handles JWT auth and websocket connections from web clients.

Both services are in golang, and I especially love the autoscaling gateway part. The web client subscribes in the same way it used to with Firestore, and gets back data in the same format. The client sees “usersPublic/” and “usersPrivate/” as two separate collections with separate permissions. The gateway is responsible for translating that request to the underlying Postgres tables (in this case users), and validating that the client has permission for that data.

This works shockingly well. The entire pub/sub system only costs us about $0.50 / month in compute + egress costs, and doesn’t have many moving parts that can break (the CDC, gateway, and client code are all simple with no third-party dependencies outside of Redis / Postgres).

From a DevEx perspective, the new system is arguably simpler, too: developers don’t need to know how to model data in Firestore or Firestore security rules. As long as they understand SQL, they can quickly make an edit to our schema.sql, add or modify a route in all.go, and they’re all set. The documentation is also strong enough that LLMs have an easy time following it and making changes. From a security perspective, this is also a win as we’re no longer exposed to a dangerous denial-of-wallet (Wallet DoS) attack since our new infra is fixed-cost and insanely cheaper.

In all, including the Postgres instance and daily S3 backups, we managed to go from $550 / month to just $40 / month in database infra costs. We also reduced the client bundle size by ~100KB, which is nice.

Now that I had the new system, I tested locally to make sure everything was working as expected. Then, I deployed it live. If there were any issues, it’s easy to rollback the client since Firestore is still the source of truth at this point.

Going Live

The server cutover was sort of scary (database migrations always are). Up to this point, Firestore was still the source of truth—the CDC pub/sub system was just reading from a Postgres mirror that the synchronizer kept up to date. Deploying the new game servers is what would actually flip Postgres to being authoritative for writes.

Here’s how I rolled it out:

  1. Updated partners first. Foony is integrated with partners like FRVR, so we deployed a new client build a few days early that would use our new CDC gateway.
  2. Warned the players. About ten minutes before the cutover, we posted a heads-up about a (hopefully) short maintenance window.
  3. Took a fresh database backup. This is always a good idea with large database migrations. LLMs call this “belt and suspenders”, which basically just means being extra cautious and having a backup plan.
  4. Deployed both clusters simultaneously. No blue/green this time. I wanted to make sure all clusters were writing to the same database to avoid any potential inconsistencies.

I also had a rollback plan in place, which was very simple: if anything went sideways, I’d redeploy the previous client (which still read to Firestore), redeploy the servers, restart the backfill, and try again later. This would have caused about 5 minutes of downtime while the servers redeploy, and about half a day to catch the Postgres database back up to Firestore.

The actual cutover only took about 1 minute of downtime. The only serious bug that surfaced was in updating experience: Postgres’ LOWER() on a bigint column was implicitly casting to text. Sigh. This was an easy enough fix with a simple cast back to bigint, so I continued the migration by issuing another server deploy.

I was shocked that everything seemed to be going smoothly. There were very few people who reported issues, and those were all very minor (apart from the experience thing). What was especially shocking to me, though, is how smooth this migration went given that it was majority vibe-coded. Quite a bit different than the scary stories you might read in the news.

The Workshop Rewrite

With Postgres now authoritative for writes, only one big chunk of the codebase was still tangled up in Firestore: the workshop system. Players use it to share custom maps in Dino-Might Bomber Online, word lists in Draw & Guess, and so on. Sort of like the Steam Workshop, but for our games. It was the most Firestore-entangled feature left, on both the client and the server, and it had this awkward structure due to limitations in Firestore’s data model that I had to simplify.

To start, I gave Cursor this prompt (using Opus 4.7 high in plan mode):

Now that we've finished most of the migration to Postgres, we have the last big part of the migration: rewriting the workshop system.

We'll need to update schema.sql to support the new workshop tables.

Think of the workshop sort of like Steam's workshop.

In other words, the workshop must:
 - Support multiple games.
 - Support different data formats (e.g. maps in dinomight, word lists in paintjob)
 - Support favoriting
 - Support like / dislike
 - Support a description

Use best practices to ensure that the SQL is efficient.

After updating schema.sql, you will need to update both the backend and the frontend to use the new workshop system. You will also need to either use the `realtime.use` hook or the API for workshop info (up to you which you choose--realtime requires modifying all.go, API calls require updates to Action.ts, etc.). I'd probably go with the cdc gateway (realtime.use / all.go).

Keep your implementation simple where possible. At the end of this migration, we should no longer have `firestore.use` or any mention of firestore on the client. We should also no longer need firestore on the backend.

As part of this change, you will need to also create a backfill migration (there's already a migration backfill--just modify that to work with these firestore workshop collections) that migrates the firestore data to the postgres schema you decide on.

You shouldn't run the backfill--I will do that myself manually once you're done with your code. You can leave a call to the backfill (commented out) at the bottom of server/src/index.ts for me. The backfill should only handle migration of these remaining tables--it shouldn't do any backfilling of old tables that we've already migrated (e.g. no `userItems`, `usersPublic`, ...).

With what I’d seen of Opus 4.7’s capabilities with the Cursor harness, I expected this migration to finish overnight.

The big thing I missed in my original prompt, which the agent asked me to clarify, was how we should efficiently handle “Last Day”, “Last Week”, “Last Month”, and “All Time” sorting. The agent was pushing for a timestamp-bucketed approach (or to drop timestamp sorting altogether), but the co-founder insisted on keeping time-based sorting despite its added complexity.

After some thought, I came up with a simple solution involving a decay factor. Each workshop item has played_count_day, played_count_week, played_count_month, and played_count_all columns, and an hourly cron job multiplies the rolling columns by 23/24, 167/168, and (720-1)/720 respectively. Combined with partial indexes (WHERE private = false AND played_count_day >= 0.368) on each sort axis, this gives us dirt-cheap “most popular by time period” queries without any extra infrastructure. This is something I wouldn’t have done with Firestore due to its exorbitant costs for reads and writes, but it’s basically free in Postgres.

With that, I went to bed. I woke up, and excitedly checked its work! The agent added the new workshop_items, workshop_item_votes, and user_subscriptions tables, wired the client to read individual items through the CDC gateway (realtime.use('workshopItems/{id}')), and rewrote all six workshop actions on the backend to talk to Postgres directly. There were a few loose ends I had to clean up (one query to Firestore on the server, an error with the backfill caused by missing data in Firestore, etc.), but overall the code was almost perfect.

After running the backfill, and testing to make sure everything worked locally, I deployed the change live. With that, the codebase was finally Firestore-free. Beautiful.

Future Work

In the future, I’d also like to support patches in the gateway. Right now, the gateway sends back the entire document as JSON for every update. It’s a bit wasteful, but our egress is effectively unlimited thanks to being on Hetzner. I’d implement this now, but I can’t justify the added complexity quite yet.