P

Real-Time React at Champions League Level: Kafka, Socket.io and Zustand

Realtime React
Event-driven architecture
Kafka
Socket.io
Zustand
React
Next.js
·48 Min. Lesezeit
Prasath Soosaithasan
von Prasath Soosaithasan
Real-Time React at Champions League Level: Kafka, Socket.io and Zustand

Ten years ago I was sitting in a hall in Paris, at ReactEurope, watching the Facebook engineering team present GraphQL Subscriptions. It was the first time I had real-time features running in a React application, and I remember walking out of that talk genuinely excited — the demo was a chat that updated across two browser windows without a refresh, and the contract felt clean: you wrote a subscription the way you wrote a query, the server kept the connection open, the resolver pushed updates back through. It was a great experience. It was also, in retrospect, the beginning of a particular shape of real-time React that never quite became the industry standard everybody expected at the time.

GraphQL Subscriptions were a trailblazer. They showed what was possible: real-time in React, not as a polling hack glued on after the fact, but as a first-class citizen of the schema, flowing out of a resolver as naturally as any query. It worked. It excited people. Conferences filled up with talks about it, tutorials walked through the model from first principles, teams reshaped their stacks around it. In production, though, it could not meet every requirement. Apollo Client on the front end, Apollo Server (or one of a handful of alternatives) on the back, a websocket-over-graphql-protocol transport between them, every consumer of every event speaking the same schema, every retrofit into a polyglot stack — that is, a back end that runs Node, Python, and perhaps Rust side by side — expensive enough to argue about for a sprint. The model was elegant when your back end was one Node service and your front end was one React app; the seams started to show the moment either of those things stopped being true. By the second or third commercial project, I had watched the same pattern play out: teams choosing GraphQL Subscriptions, getting real value, and then progressively bolting on Redis pub/sub, Kafka, or a hand-rolled socket layer the moment a non-GraphQL consumer needed access to the same event stream. The protocol was a fine local optimum. But the moment a back end grew past one runtime, it was no longer Champions League material.

The teams playing in the Champions League — in production, at fintech platforms, in trading, in telemetry, and in real-time logistics dashboards — rely on the stack this article is about. Kafka on the back end stands unrivalled here; nothing else in the open-source ecosystem matches it for durability, throughput, multi-consumer fan-out, and replay. A thin socket gateway sits in the middle. An unopinionated state library sits on the front. No protocol lock-in. No vendor lock-in. No commitment to a single programming language. From that Paris demo ten years ago to the deployments I run today — that is what this article is about.

My Favourite Stack, in a Diagram

1┌──────────────────────────────────┐
2│ Next.js App │
3│ user click ──▶ Server Action │
4└────────────────┬─────────────────┘
5 │ business row + outbox row
6 │ in one Postgres transaction
7
8╔══════════════════════════════════╗
9║ [1] POSTGRES OUTBOX ║
10╚════════════════╤═════════════════╝
11 │ direct publish after commit
12 │ (5-s poller as fallback)
13
14╔══════════════════════════════════╗
15║ [2] KAFKA on STRIMZI ║
16║ ║
17║ fintech.bank.* ║
18║ fintech.journal.* ║
19║ fintech.tax.* ║
20╚════╤═════════════════════════════╝
21
22 ├──▶ Reactor (Node) : ledger, tax recalc
23 ├──▶ Notifier (Node) : email, push, WhatsApp
24 ├──▶ Categoriser (Python) : ML classify bank txns
25
26 ▼ (one of the consumers)
27╔══════════════════════════════════╗
28║ [3] SOCKET.IO GATEWAY ║
29║ Kafka topic ──▶ room ║
30║ per entity + user ║
31╚════════════════╤═════════════════╝
32 │ WebSocket
33
34┌──────────────────────────────────┐
35│ BROWSER │
36│ │
37│ socket.io-client │
38│ │ │
39│ ▼ │
40│ ╔══════════════════════════╗ │
41│ ║ [4] ZUSTAND STORE ║ │
42│ ╚═════════════╤════════════╝ │
43│ │ │
44│ ▼ │
45│ React components │
46└──────────────────────────────────┘

What you're looking at takes four steps to describe — one per numbered box — and the next sections walk through each of them in detail. A user clicks in the front end; a Next.js server action applies the business logic and writes the result together with an outbox row in the same Postgres transaction (①). Right after the commit, the action eagerly publishes the freshly-written outbox row to Kafka (②) — a few milliseconds in the happy path, with a 5-second poller as the safety net if that direct publish fails; from there, any number of consumers in any number of languages — Node, Python, eventually Rust too — react in parallel to the same events. One of those consumers is a Socket.io gateway (③) that forwards the events relevant to a given user over WebSocket to the browser. In the browser, each event lands in a Zustand store (④) that transparently updates the React components.

Every box is independently replaceable. Every arrow is a contract. This is the stack I run today — and it is the stack a fintech client of mine runs in production for a financial dashboard where every cell on screen represents money.

A note on the word real-time before we begin. People use it to mean two different things. There is the marketing definition — "your update arrives in a few seconds, usually" — and there is the engineering definition — "your update arrives reliably, in order, with an audit trail, and without coupling the producer to the consumer." The first is easy. The second is the entire content of this article. The stack I am about to describe is what you build when the first definition stops being enough — when financial flows are involved, or a tax position, or the running margin to your filing deadline, and a stale or out-of-order value is not a UX problem but a financial one.

The fintech I'm using as the running example operates in the corporate-finance space for incorporated businesses: double-entry bookkeeping, automated bank-transaction categorisation, quarterly VAT filings (UStVA), trade-tax filings (Gewerbesteuer), the annual corporate-income-tax return (Körperschaftsteuer), the lot. The product positioning is what the German market calls "einen CFO in der Tasche" — a CFO in your pocket: a single screen that shows you, live, your cash position, your accruing tax liability, your P&L for the quarter, what filings are due when, what your accountant adjusted from a session you weren't in.

A word on multi-tenancy up front, because it shows up in the code further down. A logged-in user in this product typically manages several incorporated companies side by side — her own GmbH, perhaps a holding, maybe a UG for a side project, or as a tax advisor the client companies in her portfolio. Inside the product, such a company is called an entity. A user can switch between the entities she has access to, and entityId is therefore the tenant scope that runs through the whole system: in event payloads, in outbox rows, in Socket.io rooms, in the Zustand store. Fintech I use interchangeably with "the application itself" — there is only one of those.

The users are founders, accountants, tax advisors; the entities they manage are GmbHs and AGs. For this article what matters is not the product itself but the demands it places on its stack: what's on screen has to be true at every moment, every number a financial statement, every delay potentially the wrong decision, every state transition part of an audit trail. A stack that delivers that under load is exactly the stack this article is about.

I'll work the four numbered boxes from top to bottom. Each one has a specific job, each one has a contract with its neighbours, and the choice of technology in each box is justified — not "we picked Kafka because it sounds professional" but "we picked Kafka because we needed a multi-consumer durable log, the easier alternatives hit clearly identifiable limits, and we know exactly where those limits are." I'll show you the actual approach I take — Drizzle schemas, Strimzi YAML, kafkajs handlers, a Socket.io server skeleton, a Zustand slice — and explain why each one looks the way it does.

Where the Easy Stack Breaks

The fintech I'm describing started where every modern dashboard starts: Supabase Realtime in front of a Postgres database, a few useEffect hooks subscribing to row-level changes, an optimistic UI on top, ship it. The setup was elegant. Six lines of code in the client subscribed to the journal_entries table, the values updated as Postgres write-ahead-log entries flowed through Supabase's logical-replication pipeline, the dashboard felt alive, the product worked. For the first six months it was unambiguously the right call — the team got to product-market fit on an architecture that took half a sprint to set up. I want to be clear up front: there is nothing wrong with this stack. It is the correct answer for a great many products, and it is the wrong answer for a small but important subset of them. This article is about understanding which side of that line you are on.

For this fintech, three pressures landed in the same quarter and pushed them across the line.

The first: the product grew a second consumer. The same bank_transactions table that the dashboard subscribed to also needed to feed a machine-learning categoriser — a Python process that ran every imported bank transaction through a trained classifier and emitted a suggested account (revenue, materials, travel, professional services, and so on) for the user to confirm with one click. Supabase Realtime fanned out from Postgres to any number of subscribers, in principle, but the categoriser wasn't a browser. It was a long-running Python service in a Kubernetes pod that needed at-least-once delivery, durable across restarts, with the ability to replay from a known offset whenever the team retrained the model or fixed a bug in the categorisation logic. Supabase Realtime's listening semantics are pragmatic for browsers — best-effort, no guarantees on missed events when a client disconnects — and that pragmatism is exactly what makes it unsuitable for a backend consumer where missing a transaction means a wrong-category booking on the user's books, which means a wrong P&L number, which means, eventually, a wrong tax filing. Compliance does not negotiate with "best-effort."

The second: the data volume crossed a threshold. A quiet morning produced a few hundred row-level changes per second across the tables the dashboard subscribed to — a bank-feed sync would bring in a thousand transactions in a single batch, each one cascading through bookkeeping entries, tax-liability recomputation, P&L aggregation. Supabase Realtime, in the team's measurements, held up to a few thousand events per second per replication slot before it started visibly lagging — and the lag was the kind of lag a financial UI cannot have. A tax-liability number that arrives at the browser eight seconds late, after twenty newer updates have already been queued behind it, is worse than no number at all. You are not looking at delayed data. You are looking at a different number than the one that is true, and you are about to make a decision based on it.

The third: the compliance team asked a question that broke the entire model. "Can you reconstruct the exact sequence of state changes that produced the filing we submitted on the 14th, in the order the user actually saw them on screen?" The answer with the Supabase-Realtime-and-optimistic-UI stack was, with some embarrassment, no. There was no log of what the browser had actually rendered. There was a log of what was in the database at the end of each second — the final state of each row — but the events that produced those rows had not been preserved as events. They had been preserved only as their last effect on the row. The audit trail compliance wanted didn't exist, and couldn't be reconstructed after the fact. For a product whose entire business is producing legally binding tax filings, this is not an inconvenience. It is a category-of-existence problem.

Three pressures, one root cause: the database was being used as both the source of truth for state and the transport for change events, and those are two different jobs. Sometimes one technology can do both adequately. Sometimes the loads diverge until it cannot. When they do, you separate them — and the separation has a name in the architecture literature, the transactional outbox pattern, and a transport, an event log, that the literature converged on over a decade ago: a durable, partitioned, replayable log. Kafka.

The rest of this article is an unhurried walk through what replacing the easy stack with that durable spine actually looks like — in code, in Kubernetes manifests, in network traffic, and on the screen.

Why Event-Driven Beats the Classic Next.js Setup

One more piece of grounding before we open the first box, because the most common alternative to the stack I'm describing is not Supabase Realtime versus Kafka. It is the much more popular default of "just put everything in a Next.js route handler or a server action." A serious section on why that default works at one scale and stops working at another is the missing piece between "my product is starting to feel complicated" and "I should look at event-driven architecture."

The classic Next.js shape needs no introduction. A user clicks "Import bank transactions"; a server action runs; that server action talks to the open-banking provider (FinAPI, Tink, GoCardless, Plaid — whichever the team picked), parses the response, inserts rows into the database, computes the new tax liability, updates the P&L cache, sends the user a push notification, fires off an email to their accountant, and returns. One function. One file. 'use server' at the top. The mental model is trivial: the call comes in, the work happens, the call returns. Junior engineers can read it in an afternoon. The whole pipeline is one stack trace deep. There is no broker, no schema registry, no consumer group, no offset, no dead-letter queue. The infrastructure footprint is the Next.js process and the Postgres next to it. Every test is a single import away. For most products, at most stages of their life, this is correct.

The problems with this shape show up exactly when the product starts to grow in two directions at once: the surface area of what each action has to do grows, and the size of the team writing those actions grows. These two growths are not the same growth, and they pull the codebase apart in different ways.

The first axis is functional coupling. The "import bank transactions" action started as one thing — fetch transactions, write them down. Six months later it does seven things. Each of those seven is owned, conceptually, by a different team: the integrations team owns the open-banking call; the bookkeeping team owns the journal-entry generation; the tax team owns the liability recomputation; the reporting team owns the P&L cache; the notifications team owns the push and email; the ML team owns the categorisation; the audit team owns the immutable receipt that compliance asked for. They all live in the same server action because that is where the user's click lands, and every time any one of those seven teams ships a change, all six others have to be reviewed for breakage. Every change is a cross-team change. The PR queue grows. The deploy cadence drops. The team that fixes a typo in a notification template has to wait for the team mid-way through an open-banking migration. None of this is a Next.js problem — it would be the same in any monolithic web framework. It is a problem of synchronous coupling masquerading as code colocation.

The second axis is operational risk. As long as the seven things are inside one server action, they share a single failure surface. The open-banking provider is having a bad afternoon and their response time goes from two hundred milliseconds to eight seconds? The action takes eight seconds. The bookkeeping ledger insertions block on it. The push-notification provider's quota is briefly exceeded? The whole action fails, the transactions you fetched are dropped on the floor, the user clicks the button again and gets duplicates. The categorisation ML model OOMs on a particular edge-case transaction? The user sees a 500 error and concludes the import button is broken. None of these are big bugs. All of them turn into outages for a feature that should only have been partially affected, because the action is the single biggest unit of atomicity in the design. One thing going wrong takes everything with it is the architectural cost of one function doing everything.

The third axis is language and runtime lock-in. The ML categoriser wants to live in Python — not because Python is fashionable, but because the modelling ecosystem is there: PyTorch, scikit-learn, the Hugging Face stack, every paper's reference implementation, the libraries every ML hire on the planet already knows. Want to add an agentic workflow that orchestrates a few LLM calls to do quarterly trend analysis or first-pass review of a complex booking? That belongs in Python too — LangChain, LangGraph, the OpenAI Python SDK, the Anthropic Python SDK, the agent-tool ecosystems are all richest there. Need a performance-critical PDF generator for monthly statements, or a numerics service for portfolio aggregation that has to run a few thousand operations per second per pod? That probably belongs in Rust, behind a small HTTP server, where the predictability and the throughput are worth the small ergonomic cost. In the classic Next.js shape, none of these can live anywhere except inside the Next.js process — because the only way another service learns that something happened is by either being called synchronously from the server action (which makes it part of the action's failure surface) or by polling the database (which is wasteful and fragile and brings back the very problem Supabase Realtime was solving). You are technologically pinned to the language your web framework happens to be in. For an application whose competitive edge depends on ML and agentic workflows — and almost every serious 2026 fintech is now in that camp — that pin is expensive.

Event-driven architecture dissolves all three pressures with a single step: the server action's only job becomes "do the minimum required to record that something happened, and emit an event saying so." The action writes the imported transactions to the database and writes one row to an outbox table — fintech.bank.transactions-imported with the batch of transactions in the payload — and returns. That is it. Total work: two database inserts in a single transaction, a few hundred milliseconds. The user gets a fast response, which they care about.

Everything else becomes an independent consumer of the event. The bookkeeping service consumes fintech.bank.transactions-imported and generates the journal entries. The categoriser — written in Python, with its own deploy cadence, owned by the ML team, scaling on its own pod — consumes the same event and emits suggested accounts. The notifications service consumes its own slice and dispatches the push and email. The tax-liability service consumes downstream bookkeeping events and recomputes the running total. The audit service consumes everything and persists an immutable receipt. Each consumer is its own process, its own repo (or directory), its own owner, its own deploy. A bug in the categoriser does not make the import button fail. A push-provider outage does not take out the bookkeeping engine. The PR queue stops being cross-team. The teams stop being one organism. The runtime stops being one runtime.

This is the decisive step. It is not a Kafka pitch in disguise — Kafka is the transport we land on a few sections from here, but the step itself is more fundamental. It is the observation that for an application of any meaningful scale, the value of "do one thing in one function and return" is overwhelmed by the cost of "every change is a cross-team change and every failure is everyone's failure." The right time to take this step is later than people think — there is real value in the classic Next.js shape, and the asynchronous pattern carries real complexity costs that a small team should not pay until they have to. But it is also earlier than people fear. By the time you are three teams and seven downstream effects deep on your most-touched action, you are already paying the costs of coupling without yet collecting the benefits of decoupling. The four-box stack is what the latter half of that journey actually looks like in production code.

One thing about this step that tends to get missed: the front end inherits the benefits. When the back end is event-driven, the front end's job stops being "tell me when this specific server action's optimistic update has come back as a confirmed write" and starts being "tell me which events have happened, and let me reconcile my view against them." That second formulation is the one that scales — to ten users at once on the same dashboard, to an accountant editing in parallel with the customer, to a third device pushing in a notification, to a future feature you have not yet written. The Zustand slice at the end of this article is the concrete shape of that reconciliation, and it only works because the event-driven backbone exists.

Box 1 — The Postgres Outbox

Box 1 is the bridge from the synchronous world of a server action to the asynchronous world of a Kafka topic. It is a single Postgres table, and it solves a specific bug class that every team eventually runs into when they try to write to a database and to a message broker at the same time.

The bug class is the dual-write problem. Naïvely, the import action writes the new transactions to Postgres and then publishes an event to Kafka. Two writes, two systems. Now reason about every possible failure point: the Postgres write succeeds but the Kafka publish times out — the database moved on, the event was lost, every downstream consumer is now wrong. Reverse it: the Kafka publish succeeds but the Postgres commit fails — there is an event saying the transactions were imported, but they were not, and every downstream consumer is again wrong, in a more dangerous direction. Add a retry loop and you get duplicates. Add an idempotency key and you have to coordinate it across both systems. There is no way to make a write to two independent systems atomic without a distributed transaction protocol, and distributed transaction protocols are operational poison. The dual-write problem is unsolvable in the form it is usually stated.

The transactional outbox dissolves the problem by collapsing the two writes into one. You do not write to Postgres and then to Kafka. You write to Postgres twice in the same transaction — once to the business table (the new transactions), once to an event_outbox table (the event you wanted to publish). One transaction. One commit. Either both writes land or neither does. The atomicity guarantee comes from Postgres, which is the system that has had four decades to get atomicity right. Two things then happen in parallel. The server action itself eagerly publishes the freshly-written outbox row to Kafka right after the commit and marks it sentAt — the happy path, a few milliseconds end-to-end, indistinguishable from a naive dual-write. And a separate background process — the outbox poller — sweeps the table every few seconds and ships any row whose direct publish failed, marking it sent once Kafka acknowledges. In the happy path the outbox is invisible; in the failure path it is the safety net that keeps the system consistent. Consumers deduplicate on a stable event id. The hard problem becomes a simple one, at the cost of one extra table and one cheap polling loop.

The actual Drizzle schema for the outbox table, lifted from one of the apps we run this pattern on:

1// features/core/server/db/eventOutbox.ts
2import { sql } from 'drizzle-orm'
3import { index, integer, jsonb, pgTable, text, timestamp, uuid } from 'drizzle-orm/pg-core'
4
5export const eventOutbox = pgTable(
6 'event_outbox',
7 {
8 id: uuid('id').primaryKey().defaultRandom(),
9 topic: text('topic').notNull(),
10 key: text('key').notNull(),
11 payload: jsonb('payload').notNull(),
12 createdAt: timestamp('created_at').notNull().defaultNow(),
13 sentAt: timestamp('sent_at'),
14 attempts: integer('attempts').notNull().default(0),
15 lastError: text('last_error'),
16 },
17 (table) => ({
18 pendingIdx: index('event_outbox_pending_idx').on(table.sentAt, table.createdAt),
19 }),
20)

Eight columns and one index — every one of them load-bearing. The id is the event's canonical identifier, generated client-side at insert time, and is what consumers dedup against if a retry causes a duplicate publish. The topic is the fully qualified Kafka topic the event should land on. The key is the partition key Kafka will use to assign the event to a partition (and therefore to a stable consumer in a consumer group). The payload is the event body — a JSONB column, schema-on-read, fast to insert and arbitrary in shape. createdAt orders the drain. sentAt is null while the row is pending and is set to the publish timestamp on success. attempts and lastError are operational diagnostics — they let the poller fail-without-blocking and let an operator inspect what went wrong on a given row weeks later. The composite index on (sentAt, createdAt) is what lets the poller's WHERE sent_at IS NULL ORDER BY created_at ASC LIMIT 100 scan stay sub-millisecond even when the table has accumulated millions of historical rows.

The server action gets two new responsibilities: insert an outbox row in the same transaction as the business change, and — right after the commit — fire an eager publish for that row to Kafka. Trimmed example:

1// features/banking/server/actions/importBankTransactions/importBankTransactions.ts
2'use server'
3
4import { eq } from 'drizzle-orm'
5import { entityActionClient } from '@/services/action'
6import { kafkaProducer } from '@/services/kafka'
7import { bankTransactions } from '@/drizzle/schema'
8import { eventOutbox } from '@/features/core/server/db/eventOutbox'
9import { ImportBankTransactionsRequest, ImportBankTransactionsResponse } from './schema'
10
11export const importBankTransactions = entityActionClient
12 .metadata({ actionName: 'importBankTransactions' })
13 .inputSchema(ImportBankTransactionsRequest)
14 .outputSchema(ImportBankTransactionsResponse)
15 .action(async ({ parsedInput, ctx: { db, currentEntity, currentUser } }) => {
16 const txns = await openBanking.fetchTransactions(parsedInput.accountId)
17
18 // (1) Postgres write — business row + outbox row in one transaction
19 const outboxRow = await db.transaction(async (tx) => {
20 await tx.insert(bankTransactions).values(
21 txns.map((t) => ({
22 entityId: currentEntity.id,
23 accountId: parsedInput.accountId,
24 externalId: t.id,
25 amount: t.amount,
26 currency: t.currency,
27 bookingDate: t.bookingDate,
28 counterparty: t.counterparty,
29 memo: t.memo,
30 })),
31 )
32
33 const [row] = await tx
34 .insert(eventOutbox)
35 .values({
36 topic: 'fintech.bank.transactions-imported',
37 key: currentEntity.id,
38 payload: {
39 entityId: currentEntity.id,
40 userId: currentUser.id,
41 accountId: parsedInput.accountId,
42 transactionIds: txns.map((t) => t.id),
43 importedAt: new Date().toISOString(),
44 },
45 })
46 .returning()
47
48 return row
49 })
50
51 // (2) Kafka publish — fire-and-forget; happy path is a few ms,
52 // the 5-s poller is the safety net if this fails.
53 void kafkaProducer
54 .send({
55 topic: outboxRow.topic,
56 messages: [
57 { key: outboxRow.key, value: JSON.stringify(outboxRow.payload) },
58 ],
59 })
60 .then(() =>
61 db
62 .update(eventOutbox)
63 .set({ sentAt: new Date() })
64 .where(eq(eventOutbox.id, outboxRow.id)),
65 )
66 .catch(() => {
67 /* poller retries */
68 })
69
70 return { imported: txns.length }
71 })

That is the whole action. It does not call the categoriser. It does not recompute the tax liability. It does not send a notification. It writes the transactions, writes one outbox row, fires the eager publish, returns. Every downstream effect is now an independent consumer subscribed to the event the outbox row carries — and the action's failure modes shrink from seven down to two: the open-banking call failed, or the database commit failed. A Kafka hiccup is not a third failure mode of the action, because the publish is fire-and-forget with the outbox poller as the safety net. Both real failure modes are recoverable, both are auditable, neither cascades.

Both writes — the Postgres transaction and the Kafka publish — live deliberately inline in the action body, not hidden behind a helper file. A custom ESLint rule enforces this for every server action in the codebase: if an action writes to a Drizzle business table, it must also (a) insert into event_outbox in the same transaction and (b) call kafkaProducer.send for that outbox row right after the commit. The dual-write pattern stays visible at the call site, where the next engineer who reads the action sees immediately what it commits and what it publishes — and nobody can quietly drop one of the two halves.

In production, the Kafka half is typically extracted into a small helper — publishOutboxRow(outboxRow) — that wraps the send call, the sentAt update and the error handling. The ESLint rule is satisfied either way, as long as the publish remains visible inside the action body. I've inlined it here purely for the article's sake, so both sides of the dual-write are legible side by side.

The poller is a small cron job that does the boring half of the work. Every five seconds it scans the outbox for rows still missing a sentAt — that is, rows whose eager post-commit publish never made it — publishes each one to its named topic, sets sentAt on success or increments attempts and stores lastError on failure. A pruning pass in the same job deletes rows whose sentAt is older than thirty days. The whole worker is well under a hundred lines of TypeScript, runs in a dedicated cron Kubernetes deployment, and is the only piece of the pipeline guaranteed to touch both Postgres and Kafka under every condition. If both the eager publish and a few poller passes fail, the outbox table fills up; the action keeps working (because all it does is insert a row); the moment Kafka recovers, the backlog drains, consumers catch up, and the system is consistent again. Recovery is automatic. Audit is built in: every event ever published is recorded in event_outbox with its full payload, indexed by time. The compliance question — "reconstruct the exact sequence of state changes" — now has a one-line SQL answer.

Box 2 — Kafka on Strimzi

Box 2 is the durable event log itself. Kafka, deployed on Kubernetes via the Strimzi operator. I want to spend a moment on this box specifically because a lot of teams skip the self-hosted option assuming it's intractable and end up renting a managed broker for €2,000 to €8,000 a month they did not need to spend.

Strimzi is a Cloud Native Computing Foundation graduated project that turns "operating a production Kafka cluster" into "writing a Kubernetes manifest." You install the operator once via Helm:

1helm repo add strimzi https://strimzi.io/charts/
2helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
3 --namespace kafka --create-namespace

Then you describe the cluster you want as a custom resource, and the operator reconciles it. In production we run KRaft mode (no ZooKeeper), with separate node pools for controllers and brokers, on a small footprint. The cluster manifest for the fintech, slightly trimmed:

1# kafka-cluster.yaml
2apiVersion: kafka.strimzi.io/v1beta2
3kind: Kafka
4metadata:
5 name: fintech
6 namespace: kafka
7 annotations:
8 strimzi.io/node-pools: enabled
9 strimzi.io/kraft: enabled
10spec:
11 kafka:
12 version: 4.1.0
13 metadataVersion: 4.1-IV0
14 listeners:
15 - name: plain
16 port: 9092
17 type: internal
18 tls: false
19 - name: tls
20 port: 9093
21 type: internal
22 tls: true
23 - name: external
24 port: 9094
25 type: cluster-ip
26 tls: true
27 configuration:
28 bootstrap:
29 host: kafka.fintech.example
30 brokers:
31 - broker: 0
32 host: kafka-0.fintech.example
33 - broker: 1
34 host: kafka-1.fintech.example
35 - broker: 2
36 host: kafka-2.fintech.example
37 config:
38 offsets.topic.replication.factor: 3
39 transaction.state.log.replication.factor: 3
40 transaction.state.log.min.isr: 2
41 default.replication.factor: 3
42 min.insync.replicas: 2
43---
44apiVersion: kafka.strimzi.io/v1beta2
45kind: KafkaNodePool
46metadata:
47 name: brokers
48 namespace: kafka
49 labels:
50 strimzi.io/cluster: fintech
51spec:
52 replicas: 3
53 roles: [broker]
54 storage:
55 type: persistent-claim
56 size: 50Gi
57 class: ebs-gp3
58 deleteClaim: false
59---
60apiVersion: kafka.strimzi.io/v1beta2
61kind: KafkaNodePool
62metadata:
63 name: controllers
64 namespace: kafka
65 labels:
66 strimzi.io/cluster: fintech
67spec:
68 replicas: 3
69 roles: [controller]
70 storage:
71 type: persistent-claim
72 size: 10Gi
73 class: ebs-gp3
74 deleteClaim: false

That is a production Kafka cluster. Three brokers, three controllers, replication factor three, minimum in-sync replicas of two — meaning we tolerate one broker loss without losing durability and one further loss without losing availability. Storage on growable EBS gp3 volumes. Three listeners — plain for internal in-cluster traffic, TLS for cross-namespace traffic that wants it, external for our producers and consumers that run outside the cluster. The external listener is exposed via cluster-IP with TLS passthrough through an Envoy gateway, which gives us per-broker SNI routing without renting a load balancer per broker.

Topics, by contrast, are deliberately not Kubernetes resources. We keep business-domain names — fintech.bank.transactions-imported, fintech.journal.entry-posted, every event the product happens to care about — out of infrastructure manifests entirely. The topic name is product vocabulary; the Kubernetes cluster has no business knowing about it, and certainly should not need a re-deploy every time the product team agrees on a new event.

Instead, the topic catalogue lives in the application repository. Each consumer's startup uses the kafkajs admin client to ensure-create the topics it cares about — idempotently, so the call is a no-op if the topic already exists:

1import { kafka } from '@/services/kafka'
2
3const admin = kafka.admin()
4await admin.connect()
5await admin.createTopics({
6 topics: [
7 { topic: 'fintech.bank.transactions-imported', numPartitions: 12, replicationFactor: 3 },
8 { topic: 'fintech.bank.transaction-categorisation-suggested', numPartitions: 12, replicationFactor: 3 },
9 // …one entry per event this consumer cares about
10 ],
11})
12await admin.disconnect()

The Kubernetes manifest stays purely structural — brokers, controllers, listeners, storage. The topic catalogue stays in TypeScript, beside the producers and consumers that actually publish and subscribe to it. Onboarding a new event becomes a one-line addition in the consumer's startup file, merged in the same PR as the producer that emits it — no infrastructure ticket required.

The topic name is the contract. We enforce a single naming convention across the whole codebase — <app>.<feature>.<event-kebab-case> — at lint time via a custom ESLint rule (strict/kafka-topic-kebab-case) that fires on any topic string that does not match the regex. The shape is deliberate. The first segment scopes the event to the producing application, so two apps that share a Kafka cluster never accidentally consume each other's events. The second segment names the feature the event belongs to, so a glob of fintech.bank.* selects every banking event for a bulk subscription. The third segment names the event itself, in kebab-case, past tense — events describe things that have already happened, not commands. Past tense is a small thing that prevents an entire category of conceptual bug: a consumer that confuses a command ("please import these") with an event ("these were imported") will eventually mis-handle a retry.

The producer side is almost not worth showing. kafkajs is the client of choice for Node, and a producer call is three lines:

1import { kafka } from '@/services/kafka'
2
3const producer = kafka.producer()
4await producer.connect()
5await producer.send({
6 topic: row.topic,
7 messages: [{ key: row.key, value: JSON.stringify({ id: row.id, payload: row.payload }) }],
8})

The interesting work is on the consumer side, and the operational properties of the broker — partitioning, consumer groups, offset management, replay — are the entire point of the box. Twelve partitions on fintech.bank.transactions-imported means twelve consumer instances of a given group can process the topic in parallel; the partition key (the fintech id, in our case) guarantees that all events for a given customer land on the same partition and therefore arrive in order at the same consumer instance. Replay is a one-line operation: --reset-offsets --to-earliest on a consumer group, and every event from the start of the retention window flows back through that consumer's handler. We have used this to backfill new consumers, to re-run the ML categoriser after retraining, to reconstruct a corrupted P&L cache from the source events, and to satisfy the compliance audit I mentioned earlier. Each of those operations is impossible against Supabase Realtime, expensive against a managed broker, and routine against this stack.

The Polyglot Pay-Off

With Boxes 1 and 2 in place, the back end has changed shape. There is one server action that emits events, and a fleet of independent consumers that react to them. The thing this enables — and the thing the rest of the back end half of the article is about — is picking the right tool for each consumer, independently of the language the application was written in. The fintech I keep coming back to runs three different consumer families today, in three different languages, with no coordination between them beyond the topic contract.

The reactor is the Node.js catch-all. It is a small TypeScript service, deployed as its own Kubernetes Deployment, that subscribes to broad swathes of topics — fintech.bank.*, fintech.journal.*, fintech.tax.* — and dispatches each event to a handler colocated with the feature that owns it. A handler is just an async function that takes the parsed event and performs its assigned reaction. The reactor handles the bread-and-butter integrations: writing the journal entries when a bank-transaction-imported event arrives, recomputing tax liability when a journal-entry-posted event arrives, marking a filing as submitted when the response from the tax authority comes back. It is the consumer for the Node-shaped work — anything that's mostly orchestration of database writes and SDK calls. It shares its dependency graph with the Next.js app (same Drizzle schema, same domain types, same i18n), which keeps the cognitive overhead near zero for the engineers who already work in the Next.js codebase.

The notifier is a sibling Node.js service with a narrower brief. It subscribes only to events that result in a user-visible message — fintech.tax.filing-submitted, fintech.journal.entry-flagged-for-review, fintech.banking.unusual-transaction-detected — and turns each one into the correct combination of email, push, and (for the German market) a WhatsApp message. The notifier is its own service rather than a reactor handler because it has different scaling characteristics and a different deployment cadence than the rest of the reactor — provider quotas, rate limits, retry strategies, template rendering, all the operational specifics of "we are sending messages to humans" that benefit from being isolated. It is still Node.js, and shares the same Drizzle schema for user-preferences lookups, but it is its own pod, its own consumer group, its own SLOs.

The categoriser is where it gets interesting. The categoriser is a Python FastAPI service, owned by the ML team, with a completely different deployment pipeline from the Node side of the house. It consumes fintech.bank.transactions-imported directly from Kafka — using aiokafka, the asyncio-native Kafka client — runs each transaction through a fine-tuned classifier (a relatively small transformer trained on the German chart-of-accounts vocabulary), and publishes its prediction as fintech.bank.transaction-categorisation-suggested. The reactor's bookkeeping handler then picks up that suggestion and either applies it directly (for high-confidence categorisations) or surfaces it to the user for one-click confirmation (for everything else). The ML team writes Python because Python is where the modelling ecosystem lives — they iterate on the model in a Jupyter notebook, validate it on a held-out set in the same notebook, ship it to the FastAPI service when it beats the previous champion, and the reactor never even notices the change happened. There is no Node ⇄ Python RPC layer. There is no shared schema beyond the Kafka topic contract. The two services know about each other only through the events they exchange.

The pattern generalises. An agentic workflow — say a quarterly trend-analysis assistant that consumes a quarter's worth of bookkeeping events and produces a written commentary the user can read alongside their P&L — also lives in Python, also as a FastAPI service, also as a Kafka consumer. It subscribes to fintech.reporting.quarter-closed, fetches the quarter's events, fans out a structured prompt across the OpenAI or Anthropic SDK, and publishes fintech.reporting.quarter-commentary-ready when it is done. The Python ecosystem for that work — LangGraph, LangChain, the agent-tool integrations, the structured-output libraries — is years ahead of the Node equivalents, and the team's instinct to reach for it is correct. The event-driven backbone is what makes the instinct cheap to follow. There is no integration ticket. There is no "let's spin up an internal gRPC service." There is a new consumer group subscribed to a topic that already exists, written in the language the team is already productive in, deployed independently from everything else.

A Rust service would slot in the same way. The fintech I'm describing has not needed one yet, but the day they need to render ten thousand PDFs an hour for end-of-year statements, or run a numerically intensive portfolio aggregator that has to fit a hard latency budget, the answer is not "rewrite the Node reactor in Rust"; it is "add a Rust consumer that subscribes to the relevant topics and does the heavy lifting." The choice of language is local to the consumer. Every consumer pays only the operational cost of its own runtime. The application as a whole is polyglot by default, not by exception.

This is the architectural property that separates this stack from any monolithic alternative, and it is — in my experience — the single biggest reason a serious fintech adopts it. It is not the durability of the log, it is not the audit trail, it is not even the multi-consumer fan-out, attractive though all three are. It is the freedom to assemble a back end from the best language for each job, and to evolve each piece independently, without ever paying the integration tax that polyglot architectures used to imply. Kafka is the lingua franca that makes the polyglot affordable.

"But Kafka Is Not Real-Time React"

At this point the back-end half of the system is complete. Events flow through a durable log; consumers in three languages react independently; the audit trail is built in; the compliance question has an answer. And none of it, by itself, makes a number on a screen in a React application update in real time. The browser does not — and should not — speak Kafka.

Three reasons it should not. One: the Kafka wire protocol assumes long-lived TCP connections to specific brokers, with the client maintaining metadata about which broker owns which partition. None of that fits the browser's actual network conditions, where connections drop, IPs change, and the proxy layer in between has no concept of partition affinity. Two: exposing a Kafka broker to the public internet is an authentication and authorisation surface area that nobody wants to maintain. Kafka has SASL and ACLs, but mapping them onto per-user, per-customer, per-tenant browser-side authentication is an integration nightmare. Three: the broker's data model is topic-and-partition, not user-and-session. Filtering — deciding which events this particular user should receive and which they should not — has to happen somewhere, and the broker is the wrong place for it. That filtering is the job of the next box.

Box 3 — The Socket.io Server

Box 3 is the bridge from the Kafka topic to the browser. It is a small Node.js service whose entire job is to consume from Kafka on one side, hold open WebSocket connections to browsers on the other side, and route the right events to the right connections. We use Socket.io for the browser-facing side, for reasons I will get to.

The architecture of the service is a single duality. On the Kafka side, it is a consumer like any other — it joins a consumer group, subscribes to the topics it cares about, and receives an event handler call for every message. On the browser side, it is a Socket.io server — it accepts WebSocket connections, authenticates them against the application's session cookie, places each connection into one or more rooms based on the user's identity and tenancy, and emits events to those rooms. The interesting work is the fan-out logic in the middle: when a fintech.tax.liability-recomputed event arrives from Kafka, which sockets should receive it? Answer: every socket in the room entity.<entityId>.user.<userId> where entityId matches the event payload and userId is in the set of users who currently have the dashboard open for that entity. The lookup is cheap because Socket.io maintains room membership in memory; the routing decision is a single get-by-room-name and emit.

Why Socket.io specifically, and not raw WebSocket? Three concrete reasons. One: reconnection. Browsers drop connections all the time — laptops sleep, mobile networks switch, proxies time out. Socket.io has a battle-tested reconnection protocol with exponential backoff, automatic re-emission of pending messages, and the concept of an acknowledged event. Writing that yourself on top of the raw WebSocket API is a project, not a weekend. Two: rooms and namespaces. The mental model of "this connection belongs to these rooms, broadcast to a room and every member receives it" is exactly the model we need for tenant scoping, and Socket.io ships it as a primitive. Raw WebSocket gives you a flat connection; you build the room abstraction on top. Three: transport fallback. In environments where WebSocket is blocked — restrictive corporate proxies, ancient mobile networks — Socket.io transparently falls back to HTTP long polling. The dashboard works for a customer whose IT team blocks WebSocket. That is one less support ticket per month.

The actual server, in skeleton form:

1// services/socket-gateway/src/server.ts
2import { Server } from 'socket.io'
3import { Kafka } from 'kafkajs'
4import { verifySessionCookie } from '@/services/auth'
5
6const io = new Server({
7 cors: { origin: process.env.APP_ORIGIN, credentials: true },
8 transports: ['websocket', 'polling'],
9})
10
11const kafka = new Kafka({
12 clientId: 'socket-gateway',
13 brokers: process.env.KAFKA_BOOTSTRAP_SERVERS.split(','),
14})
15const consumer = kafka.consumer({ groupId: 'fintech.socket-gateway' })
16
17// --- Browser side: authentication and room subscription ---
18io.use(async (socket, next) => {
19 try {
20 const cookie = socket.handshake.headers.cookie ?? ''
21 const session = await verifySessionCookie(cookie)
22 socket.data.userId = session.userId
23 socket.data.entityIds = session.entityIds
24 next()
25 } catch (err) {
26 next(new Error('unauthenticated'))
27 }
28})
29
30io.on('connection', (socket) => {
31 for (const entityId of socket.data.entityIds) {
32 socket.join(`entity.${entityId}.user.${socket.data.userId}`)
33 socket.join(`entity.${entityId}.broadcast`)
34 }
35
36 socket.on('disconnect', () => {
37 // Socket.io leaves the rooms automatically.
38 })
39})
40
41// --- Kafka side: subscribe and fan out ---
42await consumer.connect()
43await consumer.subscribe({
44 topics: [
45 'fintech.tax.liability-recomputed',
46 'fintech.journal.entry-posted',
47 'fintech.banking.transaction-categorised',
48 'fintech.reporting.quarter-commentary-ready',
49 ],
50 fromBeginning: false,
51})
52
53await consumer.run({
54 eachMessage: async ({ topic, message }) => {
55 if (!message.value) return
56 const envelope = JSON.parse(message.value.toString('utf-8'))
57 const { entityId, userId } = envelope.payload
58
59 const room = userId
60 ? `entity.${entityId}.user.${userId}`
61 : `entity.${entityId}.broadcast`
62
63 io.to(room).emit(topic, envelope)
64 },
65})
66
67io.listen(Number(process.env.SOCKET_PORT))

That is the whole service. About sixty lines of code, give or take operational concerns. The auth middleware reads the application's session cookie (the gateway shares the cookie domain with the Next.js app), verifies it, and pins the connection's userId and the set of entityIds the user has access to. On connection, the socket joins one room per entity-user pair and one room per entity for broadcast events. Kafka consumption is a flat dispatch: parse the envelope, pick the correct room, emit. Events whose payload includes a userId are routed to that user only; events without a userId are broadcast to every user with access to that entity.

Two operational points worth calling out. One: the gateway is stateless across instances except for room membership. Run two replicas behind a load balancer with sticky sessions, or use the Socket.io Redis adapter to share room state between replicas. We do the latter — Redis pub/sub costs basically nothing and gives us free horizontal scaling. A user connecting to instance A still receives events the Kafka consumer on instance B picked up, because the room emit is broadcast across the Redis fabric. Two: the gateway is the security boundary. The browser cannot ask to receive events for an arbitrary entityId; it gets the rooms its session is authorised for, and only those. The Kafka topics themselves are never reachable from the public internet; they live inside the Kubernetes cluster and only the gateway pod has network access to the brokers' external listener. If the gateway is breached, the blast radius is the events flowing through it; the Kafka cluster itself is not exposed.

One small note on the choice of Server-Sent Events as an alternative. SSE is simpler than WebSocket and is perfectly adequate for one-way push, which is exactly what the dashboard needs. If the team is starting fresh and wants the smallest possible surface area, SSE plus the EventSource API is defensible. We picked Socket.io because the same client connection also carries small upstream messages — heartbeats, "I am focused on tab X" hints, presence pings — and because the rooms primitive is genuinely a productivity win at the scale of tens of thousands of concurrent connections. Both are correct answers; pick the one whose feature set matches your roadmap.

Box 4 — Zustand as the Sink

Box 4 is the final box: the Zustand store on the React client that owns the dashboard state and subscribes to the Socket.io connection. This is where every preceding architectural choice pays off in code the React engineer actually writes.

A word on why Zustand, briefly. The state-management space in React in 2026 is congested — Redux Toolkit, Zustand, Jotai, Valtio, signals, useReducer-and-Context, TanStack Query for server state. They are all defensible for different jobs. Zustand earns its place in this stack for three reasons. One: the API is small enough that a junior engineer reads a slice and understands it the same afternoon. A store is a hook, a hook is a function, the function returns a slice of state, the slice has actions on it. Two: the subscribe-without-render escape hatch (store.subscribe()) is built in, and it is what we need for the socket connection — we want the store to react to incoming events without rendering, so a component that does not care about a specific event does not re-render when it arrives. Three: the store lives outside React. We can call its actions from a non-React context (the socket listener), which keeps the event-handling logic out of useEffect spaghetti.

The shape of the dashboard slice, abbreviated:

1// features/dashboard/client/store/dashboardStore.ts
2import { create } from 'zustand'
3import { Money } from '@fintech/money'
4import type { JournalEntry, TaxLiabilitySnapshot } from '@/types'
5
6interface DashboardState {
7 entityId: string | null
8 cashBalance: Money
9 taxLiability: TaxLiabilitySnapshot
10 recentEntries: JournalEntry[]
11 lastEventAt: Date | null
12}
13
14interface DashboardActions {
15 setFintech: (entityId: string) => void
16 applyTaxLiabilityRecomputed: (payload: {
17 entityId: string
18 liability: TaxLiabilitySnapshot
19 at: string
20 }) => void
21 applyEntryPosted: (payload: {
22 entityId: string
23 entry: JournalEntry
24 at: string
25 }) => void
26 applyTransactionCategorised: (payload: {
27 entityId: string
28 transactionId: string
29 accountCode: string
30 at: string
31 }) => void
32}
33
34export const useDashboardStore = create<DashboardState & DashboardActions>((set, get) => ({
35 entityId: null,
36 cashBalance: Money.fromCents(0, 'EUR'),
37 taxLiability: { vat: Money.fromCents(0, 'EUR'), tradeTax: Money.fromCents(0, 'EUR') },
38 recentEntries: [],
39 lastEventAt: null,
40
41 setFintech: (entityId) => set({ entityId, recentEntries: [], lastEventAt: null }),
42
43 applyTaxLiabilityRecomputed: ({ entityId, liability, at }) => {
44 if (get().entityId !== entityId) return
45 set({ taxLiability: liability, lastEventAt: new Date(at) })
46 },
47
48 applyEntryPosted: ({ entityId, entry, at }) => {
49 if (get().entityId !== entityId) return
50 set((state) => ({
51 recentEntries: [entry, ...state.recentEntries].slice(0, 50),
52 lastEventAt: new Date(at),
53 }))
54 },
55
56 applyTransactionCategorised: ({ entityId, transactionId, accountCode, at }) => {
57 if (get().entityId !== entityId) return
58 set((state) => ({
59 recentEntries: state.recentEntries.map((e) =>
60 e.sourceTransactionId === transactionId ? { ...e, accountCode } : e,
61 ),
62 lastEventAt: new Date(at),
63 }))
64 },
65}))

Three things to note. One: every action is named after the event it reconciles. applyTaxLiabilityRecomputed reconciles the store against a fintech.tax.liability-recomputed event. The naming convention makes the binding between the topic and the action self-documenting — an engineer reading the store can map each action back to the back-end event without leaving the file. Two: every action guards on entityId. Events arrive on a per-entity room, but a user can switch entities mid-session; a stale event for the previous entity must be dropped, not applied. The guard is one line and prevents an entire class of cross-tenant data leak. Three: the store uses the @fintech/money library for monetary values rather than raw numbers. This is non-negotiable in a financial application — every arithmetic operation goes through a precision-safe abstraction that handles rounding, currency, and allocation correctly. The store would be wrong in a hundred small ways if it used JavaScript numbers for money, and most of those wrongs would be invisible until a customer caught them.

The socket-to-store binding lives in a single hook, mounted at the top of the dashboard route:

1// features/dashboard/client/hooks/useDashboardSocket/useDashboardSocket.ts
2'use client'
3
4import { useEffect } from 'react'
5import { io, type Socket } from 'socket.io-client'
6import { useDashboardStore } from '@/features/dashboard/client/store/dashboardStore'
7
8let socket: Socket | null = null
9
10export function useDashboardSocket(entityId: string) {
11 const apply = {
12 taxLiabilityRecomputed: useDashboardStore((s) => s.applyTaxLiabilityRecomputed),
13 entryPosted: useDashboardStore((s) => s.applyEntryPosted),
14 transactionCategorised: useDashboardStore((s) => s.applyTransactionCategorised),
15 }
16
17 useEffect(() => {
18 if (!entityId) return
19
20 socket = io(process.env.NEXT_PUBLIC_SOCKET_URL, { withCredentials: true })
21
22 socket.on('fintech.tax.liability-recomputed', (envelope) =>
23 apply.taxLiabilityRecomputed(envelope.payload),
24 )
25 socket.on('fintech.journal.entry-posted', (envelope) =>
26 apply.entryPosted(envelope.payload),
27 )
28 socket.on('fintech.banking.transaction-categorised', (envelope) =>
29 apply.transactionCategorised(envelope.payload),
30 )
31
32 return () => {
33 socket?.disconnect()
34 socket = null
35 }
36 }, [entityId])
37}

That is the entire frontend wiring. The hook connects on mount, registers an event listener per topic the dashboard cares about, dispatches each incoming envelope's payload to the matching store action, and disconnects on unmount. No useState for any of the dashboard's data — the store owns it. No prop drilling — components that need the cash balance call useDashboardStore((s) => s.cashBalance) and re-render when (and only when) it changes. No polling, no manual refetch, no "pull-to-refresh" button. The screen is honest in the present tense, which is what the product promised.

A note on optimistic updates, because this is where event-driven frontends and traditional optimistic-UI frontends sometimes diverge. The user clicks a button that, say, confirms a categoriser's suggestion. The classic optimistic-UI step is to update the local state immediately and then send the request, rolling back if the request fails. In this stack, the rule is slightly different: optimistic mutations are allowed, but they are explicit, scoped, and short-lived. The button handler optimistically applies the change to the store with an optimistic: true flag on the affected row. When the corresponding fintech.banking.transaction-categorised event arrives back through the socket (typically within a few hundred milliseconds), the store's reconciliation logic replaces the optimistic row with the canonical one and clears the flag. If the event does not arrive within a timeout — say five seconds — the optimistic flag is escalated to an error UI and the user is asked to retry. The event from the socket is always the source of truth; the optimistic state is a UX courtesy, not a state of record. This rule is small and rigid; it prevents the entire genre of bug where the local state and the back-end state silently diverge.

The Comparison, Side by Side

To pull it all together, the table I usually show clients deciding between the easy stack and the four-box stack:

PropertyPollingSupabase Realtime / PusherThis stack (Kafka + Socket.io + Zustand)
Setup costTrivialHalf a sprintTwo to three sprints
Multi-consumer fan-outEach consumer pollsBrowsers only; backend consumers fragileNative; any language, any deploy cadence
Replay from historyImpossibleImpossibleOne-line offset reset, replays the retention window
Audit trailNone (only DB state)None (only DB state)Every event preserved as an event, queryable by time
Throughput ceilingDB-bound; collapses earlyA few thousand events/sec per slotTens of thousands/sec per partition, scales horizontally
Polyglot back endPossible but painfulBrowser-only by designFirst-class; Node + Python + Rust coexist on the same log
Vendor lock-inNoneHeavyNone; Kafka and Strimzi are open, portable, self-hosted
Monthly cost (production)€0 + DB€200–€2,000 (managed)€150–€400 in compute, fully self-hosted

The numbers in the last row are not the point and they will move; the shape is the point. The four-box stack is more expensive to set up and cheaper to run at scale; it is more complex on day one and dramatically simpler at the point where the easy stack starts visibly failing. Choose accordingly.

When This Stack Is the Right Answer — and When It Isn't

This stack is not for every product. It is for products where at least two of the following are true: the data on screen represents money or another high-stakes quantity; the back end has — or will have within the next year — more than one consumer of the same event stream; the team includes (or will include) engineers writing in more than one language; the compliance or audit story requires a replayable record of state changes; the throughput of state changes exceeds a few thousand events per second; the front end has more than one user looking at the same data at the same time. The fintech I've been describing checks every one of those boxes. So do most serious 2026 financial-data products, most healthcare platforms, most logistics applications above a certain scale, and most B2B SaaS products once their largest customer becomes a multi-team organisation in its own right.

It is the wrong answer for a single-tenant prototype, a marketing site, a small SaaS where the back end has one consumer (itself), or any product whose real-time requirements are satisfied by "the page is fresh on the next navigation." For those, Supabase Realtime, Pusher, polling, or simply no live updates at all are correct. The skill is recognising which side of the line your product is on, and being willing to migrate when it crosses.

If you are reading this because you are evaluating someone to build a real-time React platform for a fintech, a regulated industry, or any product where the data on screen carries weight, I am the person who has built this stack end-to-end and run it in production. The four boxes in this article are the actual architecture I work with every day, on multiple commercial codebases. The patterns are battle-tested. The trade-offs are explicit. The migration path from an easier starting point is well-understood and incremental — there is no rip-and-replace.

I am available for freelance engagements in this exact problem space: real-time data platforms in React and Next.js, with event-driven back ends across Node, Python, and (where it earns its keep) Rust, on Kubernetes. Get in touch via the form — I look forward to the conversation.

Kontaktaufnahme

z.B. Ludwig van Beethoven
z.B. ludwig@beethoven.co
Falls das eine Herausforderung anspricht, vor der Sie stehen, freue ich mich auf einen Austausch darüber, wie ich helfen kann.