Why I Use Materialized Views for Job Board Aggregations
Keeping p95 queries ~50ms while 7,556 jobs update daily in Postgres
When I built a PMHNP job board that aggregates 7,556+ listings across 1,368+ companies (with ~200 updates/day), the “simple” parts got hard fast—especially counts and facets. Users expect filters like state, remote, and company to feel instant. But running aggregation queries on every request competes with writes from the pipeline and spikes latency. This article is one architectural decision: why I chose PostgreSQL materialized views for aggregations (and what I gave up) to keep p95 query times around ~50ms in production.
The problem space
I’m Sathish (@Sathish_Daggula), a data engineer turned indie hacker. My production system is a niche job board for Psychiatric Mental Health Nurse Practitioners (PMHNP). It runs on Next.js 14 + Supabase (Postgres) + Vercel.
The workload is deceptively mixed:
- Read-heavy UX expectations: job listings, search, filters, “companies hiring” pages.
- Write-heavy pipeline bursts: 10+ sources → scrape → normalize → dedupe → upsert. Roughly 200+ daily updates (some days spiky).
- Aggregation-heavy UI: “jobs by state”, “top companies”, counts for filters, weekly email alerts that need grouped data.
Non-functional constraints mattered more than features:
- Latency: I targeted “feels instant” for filters. In practice: ~50ms p95 query time for the most common endpoints.
- Cost & ops: I’m a solo creator; I wanted fewer moving parts than adding Redis + workers + a separate analytics store.
- Correctness: counts that drift are worse than slow counts. If a filter says “Texas (123)”, it must be defensible.
The immediate pain: aggregation queries (COUNT/GROUP BY) were the first thing to degrade as the dataset and filters grew. They’re also the easiest to accidentally make expensive.
Key insight: for job boards, the “list page” is not the hard part—facets and rollups are.
Options considered
I evaluated four approaches for aggregations (facets + dashboards + email rollups). Here’s how I framed it.
| Option | What it is | Pros | Cons | Best when |
| 1) On-the-fly aggregations | Run GROUP BY/COUNT queries per request | Always fresh; simplest conceptually | Can get slow fast; competes with writes; needs careful indexing | Small datasets, low concurrency, few filters |
| 2) Application-side caching | Cache aggregation results (memory/Redis/CDN) | Very fast reads; flexible TTL | Cache invalidation is real work; stale data risks; extra infra | Data changes slowly or staleness is acceptable |
| 3) Precomputed tables (manual rollups) | Maintain rollup tables updated by pipeline/jobs | Fast and explicit; can be incremental | More code paths; must handle backfills; consistency bugs possible | High scale, strict SLAs, you can afford ops |
| 4) PostgreSQL materialized views | Database-managed snapshot of a query, refreshable | Fast reads; fewer app bugs; strong SQL ergonomics | Refresh cost; staleness window; concurrency nuances | Medium scale, lots of repeated rollups, minimal infra |
Why not just do on-the-fly?
I started there. It worked until I added more dimensions (state, remote, employment type, source, posted date windows) and more surfaces (homepage, company page, email alerts). A single “facet query” can be okay, but multiple facets per page means you’re effectively running a small analytics workload in your OLTP path.
Why not Redis/CDN caching?
I like caching, but I’m careful about using it as a crutch:
- Cache keys explode with combinations of filters.
- TTL-based staleness can create “why is this count wrong?” moments.
- Invalidation requires you to know exactly which writes affect which keys.
For a solo project, I wanted fewer “distributed correctness” problems.
Why not manual rollup tables?
This is the serious approach at scale. But it turns your pipeline into a mini data warehouse project:
- You need incremental logic per dimension.
- Backfills become tricky.
- If you ever change business logic (“what counts as active?”), you’re rewriting historical rollups.
I wasn’t ready to take on that complexity.
The decision
I chose PostgreSQL materialized views for the repeated aggregations that power:
- Filter counts / facets (by state, remote, company)
- “Top companies hiring” lists
- Weekly email alert grouping
Why materialized views (ranked reasons)
- Predictable read performance: the view is effectively a precomputed table.
- Lower bug surface: aggregation logic stays in SQL, not duplicated across API endpoints.
- Operational simplicity: no extra data store; refresh can run from the same scheduler as my scraper.
- Easier evolution than manual rollups: I can change the query and refresh, instead of rewriting incremental update logic.
What I gave up
- Real-time freshness: materialized views are snapshots. I accepted a refresh window.
- Refresh cost: refresh is work the database must do, and it can contend with other load.
- Some constraints: for concurrent refresh you need unique indexes and you must design the view accordingly.
Implementation overview
At a high level, my system looks like this:
The key is: the app reads facets from materialized views, while the pipeline writes to base tables. Refresh happens on a cadence that matches my “good enough” freshness requirements.
1) A concrete materialized view for facets
For example, a simplified “jobs by state” facet.
-- 1) Materialized view
create materialized view if not exists mv_jobs_by_state as
select
j.state as state,
count(*) as job_count,
max(j.updated_at) as last_updated_at
from jobs j
where j.status = 'active'
group by j.state;
-- 2) Index to make reads fast
create unique index if not exists mv_jobs_by_state_state_uidx
on mv_jobs_by_state(state);
Notes:
- I include
status = 'active'because “active jobs” is a business concept, not just a row count. - The unique index enables
refresh materialized view concurrently.
2) Refresh strategy: concurrent, scheduled, and scoped
I refresh materialized views on a schedule (Vercel Cron). The important part is to avoid blocking reads.
-- Concurrent refresh avoids blocking selects.
refresh materialized view concurrently mv_jobs_by_state;
In practice I don’t refresh every minute. For a job board, a 15–60 minute staleness window is usually acceptable, and it dramatically reduces refresh churn.
If you have multiple materialized views, refresh order matters (and you may want to stagger them).
3) Reading facets from the app
In Next.js, I query the materialized view rather than running a GROUP BY on the jobs table for every request.
// Server-side query (Next.js 14)
import { createClient } from "@supabase/supabase-js";
export async function getStateFacets() {
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
);
const { data, error } = await supabase
.from("mv_jobs_by_state")
.select("state, job_count")
.order("job_count", { ascending: false });
if (error) throw error;
return data;
}
I use the service role for server-side trusted reads; for user-facing reads you can also expose views via RLS carefully (more on that in trade-offs).
Results & learnings
What improved
- Latency: the endpoints that previously ran GROUP BY queries now read from small, indexed materialized views. This is a big contributor to keeping ~50ms p95 query times for common read paths.
- Database stability: expensive aggregations moved off the “every request” path. Writes from the ingestion pipeline are less likely to coincide with heavy read aggregation.
- Developer speed: when I added weekly job alerts, I reused the same rollups. Business logic stayed centralized.
Unexpected challenges
Refresh timing is a product decision
- If you refresh too often, you pay a constant compute tax.
- If you refresh too rarely, users see stale counts.
- For my case, “jobs updated daily-ish” means a modest refresh cadence is fine.
Concurrent refresh has requirements
- You need a unique index on the materialized view.
- Your view query must produce stable unique rows for that index.
RLS + views needs deliberate thought
- Supabase RLS is great for multi-tenant security, but you must decide whether facets are public, tenant-scoped, or admin-only.
- Sometimes it’s safer to keep views behind server-only access rather than exposing them directly.
Learning: materialized views are not “set and forget.” The refresh cadence is part of the system design.
When this doesn’t work
Materialized views are a strong middle ground, but I wouldn’t recommend them universally.
They’re a poor fit when:
- You need real-time facets (seconds-level freshness). In that case, you’ll likely need streaming updates + cache invalidation, or incremental rollups.
- Your refresh cost is too high because the underlying query scans huge tables. At large scale, you’ll want partitioning, incremental aggregation, or a separate OLAP store.
- You have high write throughput and refresh contention becomes visible. Even concurrent refresh consumes resources.
- Your facets depend on user-specific permissions. If every user sees different counts (e.g., per-tenant private jobs), you either need tenant-specific materialized views (messy) or compute on the fly with good indexes.
At some point, the “right” solution becomes: a small analytical layer (ClickHouse/BigQuery) or a dedicated caching strategy with explicit invalidation.
Key takeaways
- Facets are analytics: treat them like a separate workload from listing reads.
- Materialized views are a pragmatic middle between slow GROUP BY and complex rollup pipelines.
- Design for refresh: pick a cadence aligned with user expectations, not engineering aesthetics.
- Use concurrent refresh + unique indexes to avoid blocking reads.
- Keep business meaning in SQL (e.g., what counts as “active”) to avoid duplicating logic across endpoints.
Closing
If you’ve built a read-heavy product with frequent updates, how do you handle aggregations—materialized views, Redis caching, rollup tables, or an OLAP store? I’m especially curious what refresh/invalidation strategies have held up for you in production.