Why I Prefer Keyset Pagination for High-Volume Feeds

Pagination looks like a UI problem until it becomes a production bottleneck. Once your dataset grows and users start filtering, sorting, and jumping between pages, the wrong pagination strategy quietly burns CPU, increases query latency, and creates confusing duplicates or missing rows. As a solo creator, you don’t get to “throw a team at it”—you need a choice that’s fast, predictable, and hard to break at 2am. This is why I default to keyset (cursor) pagination over OFFSET/LIMIT for most high-volume feeds, and when I still won’t use it.

The problem space (constraints that matter)

Pagination becomes architectural when it touches three things at once:

1) Performance at scale: As tables grow from thousands to millions of rows, naive pagination can turn into a linear scan. The user still sees “page 40”, but your database is doing work proportional to 40 * page_size.

2) Correctness under writes: Feeds are rarely static. New rows get inserted, old rows get updated, and background jobs backfill data. Offset-based pagination can return duplicates or skip records as the underlying ordering shifts.

3) Operational simplicity: Solo development is a constraint. I prefer designs that are:

hard to misuse across endpoints
easy to reason about when debugging
index-friendly
cheap (less CPU, fewer slow queries)

Non-functional requirements I usually assume for a “feed-like” endpoint:

p95 latency target: < 150ms for common queries (excluding network)
Predictable performance: page 1 and page 100 shouldn’t be 10x apart
Stable ordering: no duplicates, minimal “jumping”
Backwards/forwards navigation: at least “next page”; ideally “previous” too

Why “existing solutions” don’t fit by default:

Many ORMs make OFFSET/LIMIT feel like the obvious default.
Many frontend designs assume numeric pages (1…N), which biases you toward OFFSET.
Some developers ship OFFSET early “just for MVP” and then discover it’s embedded in caching, links, emails, and analytics.

Key insight: pagination is part of your data contract. Changing it later is possible, but it’s never free.

Options considered

Below are the common strategies I’ve used or audited in production-like systems.

Option	What it is	Pros	Cons	Works best when
OFFSET/LIMIT	`ORDER BY ... OFFSET x LIMIT y`	Simple; supports random access (page 37)	Slower as offset grows; duplicates/skips under writes; deep pages expensive	Small tables; mostly static datasets; admin views
Keyset (Cursor)	`WHERE (sort_key, id) < (cursor_sort_key, cursor_id) LIMIT y`	O(1) page-to-page; stable under inserts; index-friendly	Harder random access; needs careful cursor encoding; tricky with complex sorting	Feeds, timelines, infinite scroll, large datasets
Seek by ID only	`WHERE id < last_id`	Very fast; simplest cursor	Only works if ID correlates with desired order; breaks if you sort by time/score	Append-only logs; monotonic IDs; simple “latest first”
Snapshot + OFFSET	Pin a consistent snapshot (repeatable read) and use OFFSET	Correctness improves; keeps numeric pages	Still pays offset cost; snapshot mgmt complexity; not great for long browsing sessions	Reporting; exports; short sessions
Precomputed page map	Materialize page boundaries (e.g., store cursor per page)	Enables random access + keyset speed	Extra storage; invalidation complexity; rebuild cost	Highly trafficked, mostly read-only catalogs

What actually bites you in production

OFFSET cost is not theoretical. In Postgres, OFFSET often means scanning and discarding rows. Even with indexes, the engine still has to walk past N rows.
Correctness is a user-facing feature. Duplicate items in a feed erode trust. Missing items can be worse.
Random page jumps are overrated. Most consumer feeds are “next/previous” patterns. Numeric page links are common in catalogs, not timelines.

The decision (what I choose and why)

I default to keyset pagination with a composite cursor:

Primary sort key: created_at (or whatever defines the feed)
Tie-breaker: id (unique, stable)

Why (ranked reasons)

1) Predictable query cost: page 1 and page 100 are similar complexity. 2) Correctness under concurrent writes: fewer duplicates/skips because you’re anchoring to a position, not a row count. 3) Index leverage: a composite index can satisfy the query efficiently. 4) Simpler operations: fewer slow queries, fewer surprise p95 spikes.

What I give up

True random access like “go to page 42” is non-trivial.
You must design a cursor format (encoding/decoding, validation, expiry decisions).
Some sorting modes don’t map well (e.g., sorting by a computed score that changes frequently).

Implementation overview (Postgres examples)

1) Schema + index that makes keyset work

Assume a table:

id is unique
created_at is the primary ordering

-- Postgres
CREATE TABLE items (
  id BIGSERIAL PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  title TEXT NOT NULL,
  payload JSONB
);

-- Composite index to support ORDER BY created_at DESC, id DESC
CREATE INDEX items_created_at_id_desc
ON items (created_at DESC, id DESC);

The tie-breaker (id) matters because many rows can share the same created_at at millisecond resolution or due to batch inserts.

2) The keyset query (next page)

The cursor is the last row of the previous page: (created_at, id).

-- First page (no cursor)
SELECT id, created_at, title
FROM items
ORDER BY created_at DESC, id DESC
LIMIT $1;

-- Next page (cursor provided)
SELECT id, created_at, title
FROM items
WHERE (created_at, id) < ($2::timestamptz, $3::bigint)
ORDER BY created_at DESC, id DESC
LIMIT $1;

Why tuple comparison? It’s concise and maps cleanly to the “sort key + tie-breaker” concept.

3) Cursor encoding (don’t leak raw values blindly)

I like a compact, signed cursor so:

clients can’t easily tamper with it
you can evolve cursor formats

Below is a minimal Node-style example (works similarly in any backend):

import crypto from "crypto";

const SECRET = process.env.CURSOR_SECRET;

export function encodeCursor({ createdAt, id }) {
  const payload = JSON.stringify({ v: 1, createdAt, id });
  const sig = crypto.createHmac("sha256", SECRET).update(payload).digest("base64url");
  return Buffer.from(payload).toString("base64url") + "." + sig;
}

export function decodeCursor(cursor) {
  const [b64, sig] = cursor.split(".");
  const payload = Buffer.from(b64, "base64url").toString("utf8");
  const expected = crypto.createHmac("sha256", SECRET).update(payload).digest("base64url");
  if (sig !== expected) throw new Error("Invalid cursor");
  const obj = JSON.parse(payload);
  if (obj.v !== 1) throw new Error("Unsupported cursor version");
  return obj;
}

Trade-off: signing adds a tiny CPU cost, but it prevents a whole class of “cursor points to weird place” bugs and makes abuse harder.

4) API shape (make misuse difficult)

I prefer an API contract like:

limit (bounded)
cursor (opaque)
returns: items[], next_cursor

{
  "items": [{ "id": 123, "created_at": "2026-01-29T10:00:00Z", "title": "..." }],
  "next_cursor": "eyJ2IjoxLCJjcmVhdGVkQXQiOiIyMDI2LTAxLTI5VDEwOjAwOjAwWiIsImlkIjoxMjN9.abc..."
}

This nudges the frontend toward “infinite scroll / load more”, which matches the strengths of keyset.

Diagram: how keyset pagination reads data

The key is that the second query doesn’t “count past” earlier rows; it “seeks” to a position.

Results & learnings (numbers + what surprised me)

I’m intentionally not tying this to any specific product; these are representative numbers from benchmarking patterns I’ve repeated over time.

Performance comparison (representative)

On Postgres with a table in the 1–10M row range, ordering by (created_at DESC, id DESC):

OFFSET/LIMIT
- page 1 (OFFSET 0): often ~10–30ms query time
- page 500 (OFFSET 9,980 with limit 20): can drift to ~80–250ms depending on cache and vacuum state
- deep pages: p95 spikes are common under concurrent load
Keyset
- page 1: ~10–30ms (similar)
- page 500: usually stays in the same band (~10–40ms) because the index seek remains efficient

What surprised me early on:

OFFSET-based endpoints can look “fine” in staging because you rarely test deep pages.
Keyset pagination makes caching easier for “top of feed” traffic, because you avoid expensive deep scans that compete for shared resources.
The biggest win isn’t average latency—it’s tail latency predictability.

Operational learning

If you’re solo, the win is fewer incidents caused by growth. Keyset is one of those choices where you pay a bit of complexity upfront (cursor encoding, edge cases) to avoid repeated performance firefighting later.

When this doesn’t work

Keyset pagination is not a universal default.

Use something else when:

1) You truly need random access (e.g., “page 37 of 2,000” is a real UX requirement). Catalogs, admin UIs, and compliance exports often need numeric pages.

2) Your sort key is unstable (e.g., “score” changes frequently). If the ordering changes between requests, any pagination strategy can feel inconsistent—but keyset can be especially confusing because the cursor anchors to a moving target.

3) You need total counts and exact page numbers. Keyset doesn’t naturally provide “total pages”. You can compute counts separately, but it’s another query and can be expensive.

4) Complex multi-column sorts with NULL semantics. Still doable, but your cursor logic becomes more fragile. At some point, you’re building a mini query planner.

In those cases, I’ll either:

accept OFFSET for smaller datasets + add safeguards (max page, caching, read replica), or
build a hybrid: keyset for “browse”, offset for “jump”, or
materialize results (precomputed boundaries) if the dataset is mostly read-only.

Key takeaways (a framework you can reuse)

1) Decide based on growth shape, not current size. If you expect the table to grow continuously, avoid strategies with linear deep-page costs. 2) Correctness under writes is part of UX. If duplicates/skips are unacceptable, prefer cursor-based approaches. 3) Pick an ordering you can index. Keyset only shines when your ORDER BY matches an index. 4) Make the cursor opaque and versioned. You’ll thank yourself when you evolve sorting or add filters. 5) Optimize for operations as a solo creator: predictable p95 beats cleverness.

Closing

If you had to choose today: would you trade away random page access to get stable performance and fewer pagination bugs? I’m curious where you draw that line—especially for datasets that are both large and frequently updated.

Why I Prefer Keyset Pagination for High-Volume Feeds

The problem space (constraints that matter)

Options considered

What actually bites you in production

The decision (what I choose and why)

Why (ranked reasons)

What I give up

Implementation overview (Postgres examples)

1) Schema + index that makes keyset work

2) The keyset query (next page)

3) Cursor encoding (don’t leak raw values blindly)

4) API shape (make misuse difficult)

Results & learnings (numbers + what surprised me)

Performance comparison (representative)

Operational learning

When this doesn’t work

Key takeaways (a framework you can reuse)

Closing

Comments

More from this blog

Why I Use SQLite Savepoints for Offline Workout Logging

Why I Use Partial Indexes for “Active Jobs” in Postgres

How We Turn a 35% BLS PMHNP Growth Projection Into Search, Alerts, and Better Job Matches

How We Counted 693 Live PMHNP Openings in California (and Why Volume ≠ Fit)

How We Detect “PMHNP-BC Required” in 500+ Job Feeds (and What the Credential Actually Means)

Command Palette

The problem space (constraints that matter)

Options considered

What actually bites you in production

The decision (what I choose and why)

Why (ranked reasons)

What I give up

Implementation overview (Postgres examples)

1) Schema + index that makes keyset work

2) The keyset query (next page)

3) Cursor encoding (don’t leak raw values blindly)

4) API shape (make misuse difficult)

Diagram: how keyset pagination reads data

Results & learnings (numbers + what surprised me)

Performance comparison (representative)

Operational learning

When this doesn’t work

Key takeaways (a framework you can reuse)

Closing

Comments

More from this blog