Why I Prefer Keyset Pagination for High-Volume Feeds
A solo-friendly decision framework for speed, cost, and correctness
Pagination looks like a UI problem until it becomes a production bottleneck. Once your dataset grows and users start filtering, sorting, and jumping between pages, the wrong pagination strategy quietly burns CPU, increases query latency, and creates confusing duplicates or missing rows. As a solo creator, you don’t get to “throw a team at it”—you need a choice that’s fast, predictable, and hard to break at 2am. This is why I default to keyset (cursor) pagination over OFFSET/LIMIT for most high-volume feeds, and when I still won’t use it.
The problem space (constraints that matter)
Pagination becomes architectural when it touches three things at once:
1) Performance at scale: As tables grow from thousands to millions of rows, naive pagination can turn into a linear scan. The user still sees “page 40”, but your database is doing work proportional to 40 * page_size.
2) Correctness under writes: Feeds are rarely static. New rows get inserted, old rows get updated, and background jobs backfill data. Offset-based pagination can return duplicates or skip records as the underlying ordering shifts.
3) Operational simplicity: Solo development is a constraint. I prefer designs that are:
- hard to misuse across endpoints
- easy to reason about when debugging
- index-friendly
- cheap (less CPU, fewer slow queries)
Non-functional requirements I usually assume for a “feed-like” endpoint:
- p95 latency target: < 150ms for common queries (excluding network)
- Predictable performance: page 1 and page 100 shouldn’t be 10x apart
- Stable ordering: no duplicates, minimal “jumping”
- Backwards/forwards navigation: at least “next page”; ideally “previous” too
Why “existing solutions” don’t fit by default:
- Many ORMs make OFFSET/LIMIT feel like the obvious default.
- Many frontend designs assume numeric pages (1…N), which biases you toward OFFSET.
- Some developers ship OFFSET early “just for MVP” and then discover it’s embedded in caching, links, emails, and analytics.
Key insight: pagination is part of your data contract. Changing it later is possible, but it’s never free.
Options considered
Below are the common strategies I’ve used or audited in production-like systems.
| Option | What it is | Pros | Cons | Works best when |
| OFFSET/LIMIT | ORDER BY ... OFFSET x LIMIT y | Simple; supports random access (page 37) | Slower as offset grows; duplicates/skips under writes; deep pages expensive | Small tables; mostly static datasets; admin views |
| Keyset (Cursor) | WHERE (sort_key, id) < (cursor_sort_key, cursor_id) LIMIT y | O(1) page-to-page; stable under inserts; index-friendly | Harder random access; needs careful cursor encoding; tricky with complex sorting | Feeds, timelines, infinite scroll, large datasets |
| Seek by ID only | WHERE id < last_id | Very fast; simplest cursor | Only works if ID correlates with desired order; breaks if you sort by time/score | Append-only logs; monotonic IDs; simple “latest first” |
| Snapshot + OFFSET | Pin a consistent snapshot (repeatable read) and use OFFSET | Correctness improves; keeps numeric pages | Still pays offset cost; snapshot mgmt complexity; not great for long browsing sessions | Reporting; exports; short sessions |
| Precomputed page map | Materialize page boundaries (e.g., store cursor per page) | Enables random access + keyset speed | Extra storage; invalidation complexity; rebuild cost | Highly trafficked, mostly read-only catalogs |
What actually bites you in production
- OFFSET cost is not theoretical. In Postgres, OFFSET often means scanning and discarding rows. Even with indexes, the engine still has to walk past N rows.
- Correctness is a user-facing feature. Duplicate items in a feed erode trust. Missing items can be worse.
- Random page jumps are overrated. Most consumer feeds are “next/previous” patterns. Numeric page links are common in catalogs, not timelines.
The decision (what I choose and why)
I default to keyset pagination with a composite cursor:
- Primary sort key:
created_at(or whatever defines the feed) - Tie-breaker:
id(unique, stable)
Why (ranked reasons)
1) Predictable query cost: page 1 and page 100 are similar complexity. 2) Correctness under concurrent writes: fewer duplicates/skips because you’re anchoring to a position, not a row count. 3) Index leverage: a composite index can satisfy the query efficiently. 4) Simpler operations: fewer slow queries, fewer surprise p95 spikes.
What I give up
- True random access like “go to page 42” is non-trivial.
- You must design a cursor format (encoding/decoding, validation, expiry decisions).
- Some sorting modes don’t map well (e.g., sorting by a computed score that changes frequently).
Implementation overview (Postgres examples)
1) Schema + index that makes keyset work
Assume a table:
idis uniquecreated_atis the primary ordering
-- Postgres
CREATE TABLE items (
id BIGSERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
title TEXT NOT NULL,
payload JSONB
);
-- Composite index to support ORDER BY created_at DESC, id DESC
CREATE INDEX items_created_at_id_desc
ON items (created_at DESC, id DESC);
The tie-breaker (id) matters because many rows can share the same created_at at millisecond resolution or due to batch inserts.
2) The keyset query (next page)
The cursor is the last row of the previous page: (created_at, id).
-- First page (no cursor)
SELECT id, created_at, title
FROM items
ORDER BY created_at DESC, id DESC
LIMIT $1;
-- Next page (cursor provided)
SELECT id, created_at, title
FROM items
WHERE (created_at, id) < ($2::timestamptz, $3::bigint)
ORDER BY created_at DESC, id DESC
LIMIT $1;
Why tuple comparison? It’s concise and maps cleanly to the “sort key + tie-breaker” concept.
3) Cursor encoding (don’t leak raw values blindly)
I like a compact, signed cursor so:
- clients can’t easily tamper with it
- you can evolve cursor formats
Below is a minimal Node-style example (works similarly in any backend):
import crypto from "crypto";
const SECRET = process.env.CURSOR_SECRET;
export function encodeCursor({ createdAt, id }) {
const payload = JSON.stringify({ v: 1, createdAt, id });
const sig = crypto.createHmac("sha256", SECRET).update(payload).digest("base64url");
return Buffer.from(payload).toString("base64url") + "." + sig;
}
export function decodeCursor(cursor) {
const [b64, sig] = cursor.split(".");
const payload = Buffer.from(b64, "base64url").toString("utf8");
const expected = crypto.createHmac("sha256", SECRET).update(payload).digest("base64url");
if (sig !== expected) throw new Error("Invalid cursor");
const obj = JSON.parse(payload);
if (obj.v !== 1) throw new Error("Unsupported cursor version");
return obj;
}
Trade-off: signing adds a tiny CPU cost, but it prevents a whole class of “cursor points to weird place” bugs and makes abuse harder.
4) API shape (make misuse difficult)
I prefer an API contract like:
limit(bounded)cursor(opaque)- returns:
items[],next_cursor
{
"items": [{ "id": 123, "created_at": "2026-01-29T10:00:00Z", "title": "..." }],
"next_cursor": "eyJ2IjoxLCJjcmVhdGVkQXQiOiIyMDI2LTAxLTI5VDEwOjAwOjAwWiIsImlkIjoxMjN9.abc..."
}
This nudges the frontend toward “infinite scroll / load more”, which matches the strengths of keyset.
Diagram: how keyset pagination reads data
The key is that the second query doesn’t “count past” earlier rows; it “seeks” to a position.
Results & learnings (numbers + what surprised me)
I’m intentionally not tying this to any specific product; these are representative numbers from benchmarking patterns I’ve repeated over time.
Performance comparison (representative)
On Postgres with a table in the 1–10M row range, ordering by (created_at DESC, id DESC):
OFFSET/LIMIT
- page 1 (OFFSET 0): often ~10–30ms query time
- page 500 (OFFSET 9,980 with limit 20): can drift to ~80–250ms depending on cache and vacuum state
- deep pages: p95 spikes are common under concurrent load
Keyset
- page 1: ~10–30ms (similar)
- page 500: usually stays in the same band (~10–40ms) because the index seek remains efficient
What surprised me early on:
- OFFSET-based endpoints can look “fine” in staging because you rarely test deep pages.
- Keyset pagination makes caching easier for “top of feed” traffic, because you avoid expensive deep scans that compete for shared resources.
- The biggest win isn’t average latency—it’s tail latency predictability.
Operational learning
If you’re solo, the win is fewer incidents caused by growth. Keyset is one of those choices where you pay a bit of complexity upfront (cursor encoding, edge cases) to avoid repeated performance firefighting later.
When this doesn’t work
Keyset pagination is not a universal default.
Use something else when:
1) You truly need random access (e.g., “page 37 of 2,000” is a real UX requirement). Catalogs, admin UIs, and compliance exports often need numeric pages.
2) Your sort key is unstable (e.g., “score” changes frequently). If the ordering changes between requests, any pagination strategy can feel inconsistent—but keyset can be especially confusing because the cursor anchors to a moving target.
3) You need total counts and exact page numbers. Keyset doesn’t naturally provide “total pages”. You can compute counts separately, but it’s another query and can be expensive.
4) Complex multi-column sorts with NULL semantics. Still doable, but your cursor logic becomes more fragile. At some point, you’re building a mini query planner.
In those cases, I’ll either:
- accept OFFSET for smaller datasets + add safeguards (max page, caching, read replica), or
- build a hybrid: keyset for “browse”, offset for “jump”, or
- materialize results (precomputed boundaries) if the dataset is mostly read-only.
Key takeaways (a framework you can reuse)
1) Decide based on growth shape, not current size. If you expect the table to grow continuously, avoid strategies with linear deep-page costs.
2) Correctness under writes is part of UX. If duplicates/skips are unacceptable, prefer cursor-based approaches.
3) Pick an ordering you can index. Keyset only shines when your ORDER BY matches an index.
4) Make the cursor opaque and versioned. You’ll thank yourself when you evolve sorting or add filters.
5) Optimize for operations as a solo creator: predictable p95 beats cleverness.
Closing
If you had to choose today: would you trade away random page access to get stable performance and fewer pagination bugs? I’m curious where you draw that line—especially for datasets that are both large and frequently updated.