Skip to content
PG Horizon
All posts
· 9 min read · By PG Horizon

PostgreSQL Logical Replication, Step by Step

A practical, no-hand-waving walkthrough of setting up PostgreSQL logical replication — from a fresh source database to a working stream of changes — plus the gotchas nobody tells you about.

  • postgresql
  • replication
  • operations

Logical replication is one of PostgreSQL’s most powerful — and most misunderstood — features. Get it right and you have the foundation for analytics replicas, multi-region disaster recovery, zero-downtime version upgrades, and live migrations. Get it wrong and you’ll be debugging WAL bloat at 2 AM.

This is a practical walkthrough of setting up native PostgreSQL logical replication between two databases. We’ll go from a fresh source to a working stream of changes flowing into a destination, then cover the gotchas nobody tells you about.

What logical replication actually is

PostgreSQL gives you two replication flavours:

  • Physical (streaming) replication replicates the entire cluster at the byte level. Same major version on both sides, all-or-nothing, the standby is read-only and must mirror the primary exactly.
  • Logical replication decodes the WAL into row-level changes and ships them to a subscriber. Per-table, per-database, can cross major versions, and the destination is a fully writable database in its own right.

Logical is what you want when:

  • You need to replicate a subset of tables, not the whole cluster.
  • The destination is on a different major version (e.g. PG 14 → PG 16 for a zero-downtime upgrade).
  • The destination needs to be writable (analytics, multi-master, sharding).
  • You want to transform or filter rows in flight (column lists and row filters in PG 15+).

It’s not what you want when you need exact byte-for-byte redundancy — use physical replication for that.

Prerequisites

Both source and destination must be PostgreSQL 10 or newer; PG 14+ is strongly recommended (subscription resilience and the binary copy path improved significantly through PG 16).

You’ll need:

  • Source: superuser access (or a role that can create publications and a REPLICATION role).
  • Destination: a database with the same schema for the tables you’re replicating.
  • Network: the destination must reach the source on PG’s port (5432 by default).
  • Disk headroom on the source: WAL is retained until the slot catches up. Plan for at least 10× your average WAL/hour rate as a buffer.

Step 1 — Configure the source

Edit postgresql.conf on the source:

wal_level             = logical
max_replication_slots = 10
max_wal_senders       = 10
wal_sender_timeout    = 60s

Then restart the sourcewal_level only takes effect on restart, not reload.

In pg_hba.conf, allow replication connections from the destination:

host  all  pgpipe_repl  10.0.0.0/8  scram-sha-256

Reload (SELECT pg_reload_conf();) and verify:

SHOW wal_level;                  -- should print 'logical'
SHOW max_replication_slots;

Step 2 — Create a replication role

Don’t replicate as a superuser. Create a dedicated role with the minimum privileges:

CREATE ROLE pgpipe_repl WITH LOGIN REPLICATION PASSWORD 'redacted';

GRANT USAGE ON SCHEMA public TO pgpipe_repl;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO pgpipe_repl;

-- Grants for tables created in the future:
ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO pgpipe_repl;

REPLICATION is the privilege that authorises the role to open a replication connection. SELECT is needed because the initial table copy is done via COPY, not through WAL decoding.

Step 3 — Create the publication

A publication is the source-side declaration of “here are the tables I’m exposing for replication.”

-- Three named tables:
CREATE PUBLICATION analytics_pub FOR TABLE
    public.orders,
    public.customers,
    public.line_items;

Or for an entire database (requires superuser):

CREATE PUBLICATION analytics_pub FOR ALL TABLES;

Publications can include row filters and column lists (PG 15+):

CREATE PUBLICATION fraud_pub FOR TABLE
    public.transactions (id, amount, country, created_at)
    WHERE (country IN ('US', 'GB', 'DE'));

Inspect what’s published:

SELECT * FROM pg_publication_tables;

Step 4 — Prepare the destination

The destination needs the same schema for the replicated tables (logical replication doesn’t replicate DDL — see “Gotchas” below). Easiest way is pg_dump --schema-only:

pg_dump -h source-host -U postgres -s \
    -t public.orders -t public.customers -t public.line_items \
    sourcedb | psql -h dest-host -U postgres destdb

After this, the destination has empty tables matching the source.

Step 5 — Create the subscription

On the destination:

CREATE SUBSCRIPTION analytics_sub
CONNECTION 'host=source-host dbname=sourcedb user=pgpipe_repl password=...'
PUBLICATION analytics_pub
WITH (
    copy_data = true,    -- snapshot existing rows on first run
    streaming = on       -- stream long-running transactions (PG 14+)
);

PostgreSQL will:

  1. Open a connection to the source.
  2. Create a logical replication slot named analytics_sub on the source.
  3. Take a snapshot, COPY all existing rows in the published tables.
  4. Switch to streaming mode and apply changes as they arrive.

Step 6 — Verify the pipeline

On the source:

SELECT slot_name, plugin, slot_type, active,
       confirmed_flush_lsn, restart_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),
                                      confirmed_flush_lsn)) AS lag_bytes
FROM pg_replication_slots;

active = true and lag_bytes close to zero means the pipeline is healthy.

On the destination:

SELECT subname, subenabled
FROM pg_subscription;

SELECT subname, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription;

Now insert a row on the source:

-- on source:
INSERT INTO public.customers (email) VALUES ('alice@example.com');

Within a few seconds, on the destination:

SELECT * FROM public.customers WHERE email = 'alice@example.com';

You should see Alice. Welcome to streaming.

Gotchas (the part nobody tells you)

1. DDL is not replicated

Run ALTER TABLE customers ADD COLUMN phone TEXT; on the source and the subscription will break the moment a row carrying that column flows through. You must apply DDL on the destination first, then on the source.

This is the single most common cause of “logical replication broke after a deploy” tickets.

2. Replica identity for UPDATE/DELETE

If a published table has no primary key, UPDATE and DELETE fail to replicate unless you set:

ALTER TABLE my_table REPLICA IDENTITY FULL;

FULL makes every UPDATE/DELETE log the full old row in the WAL — expensive on wide tables. Better: add a primary key.

3. Slot retention can fill your disk

If the destination is offline for hours, the source retains WAL until the slot catches up — there is no time-based eviction by default. Always monitor pg_replication_slots lag and alert.

To bound retention (PG 13+):

ALTER SYSTEM SET max_slot_wal_keep_size = '50GB';
SELECT pg_reload_conf();

The slot is dropped if the lag exceeds 50 GB — better to lose the slot than to fill the disk.

4. Disabling a subscription doesn’t free the slot

ALTER SUBSCRIPTION ... DISABLE; stops the destination from pulling, but the slot on the source keeps retaining WAL. To fully tear down:

-- destination:
ALTER SUBSCRIPTION analytics_sub DISABLE;
ALTER SUBSCRIPTION analytics_sub SET (slot_name = NONE);
DROP SUBSCRIPTION analytics_sub;

-- source (only if the subscription couldn't drop the slot):
SELECT pg_drop_replication_slot('analytics_sub');

5. Sequences don’t replicate

Logical replication ships row data, not sequence advances. After a cutover, you’ll need to manually setval() sequences on the new primary or your INSERTs will collide with existing IDs.

6. Large transactions can stall the apply worker

Before PG 14, a single multi-million-row transaction had to fully decode on the source before any of it was sent. PG 14 added streaming mode (streaming = on); PG 16 added parallel apply. Use the newest version you can.

Where native logical replication breaks down

Native logical replication is solid, but it has limits:

  • No DDL replication. Every schema change requires manual coordination.
  • No web UI. All operational state lives in pg_* catalogue views.
  • No dead-letter queue. A bad row stops the pipeline and you debug from pg_subscription_rel.
  • No schema remapping. The destination schema name must match the source.
  • No first-class observability. You read lag from pg_replication_slots; metrics for application teams require glue.

That’s why we built pgpipe. It uses the same logical decoding mechanism under the hood, but adds:

  • A web dashboard with a setup wizard
  • Automatic destination schema setup on first run (runtime DDL replication is on the roadmap, not yet shipped)
  • A dead-letter queue with REST API for inspection and replay
  • Schema remapping for multi-tenant and blue-green deployments
  • 15 Prometheus metrics out of the box
  • Apply throughput on par with native, with strict ordering preserved

Logical replication is the right primitive. pgpipe makes it production-grade without you writing the operational glue. Try it with Docker — under a minute to see replication working.


Need a hand setting up logical replication on a real production system? PG Horizon does this for clients every week — get in touch.

Need a hand with PostgreSQL?

We design, tune, and operate PostgreSQL for teams that can't afford downtime.

Talk to us