Why I Use pg_trgm Fuzzy Search Instead of Full-Text Search

Search is where job boards quietly fail. Users don’t type perfect keywords, job titles aren’t standardized across sources, and “PMHNP” gets spelled five different ways. In my PMHNP Hiring job board (7,556+ jobs, 1,368+ companies, ~200 daily updates), I had to pick a search strategy that was fast (<50ms p95 for common queries), typo-tolerant, and cheap to operate as a solo creator. This post is the architecture decision I made: choosing PostgreSQL pg_trgm similarity search over classic full‑text search—and what I gave up to get there.

Context: search in a scraped job board is messy by default

PMHNP Hiring is a job board I built for Psychiatric Mental Health Nurse Practitioners. The “product” looks simple—filter by location, remote, company, posted date, and search by title/company. The system behind it is less clean because the data comes from 10+ sources with different formatting and inconsistent fields.

A few constraints shaped the search decision:

Scale & churn: 7,556+ active jobs, 1,368+ companies, and ~200 daily updates (incremental ingestion, not full refresh).
UX reality: users type partial queries (“psych”, “nurse prac”), acronyms (“PMHNP”), and typos (“psychiatric nurse practioner”).
Performance target: keep the common feed queries (filters + search) at ~50ms p95 at the database layer.
Operational simplicity: I’m a solo creator. I didn’t want a separate search cluster to babysit.
Security model: Supabase + PostgreSQL with RLS. I wanted search to stay inside Postgres so it inherits the same access control semantics.

This is where the first non-obvious problem appears: classic full-text search (FTS) is great for “documents”, but job titles and company names behave more like short strings where typo tolerance and partial matches dominate.

In a job board, search is less about linguistic relevance and more about forgiving messy input.

The problem space: what I actually needed from search

I wasn’t building Google. I needed a search that:

Works well on short fields: title, company_name, and sometimes location_text.
Supports “contains-like” behavior: users often remember only part of a title.
Handles typos: similarity, not exact token match.
Composes with filters: search + (state, remote, posted_at, source) should stay fast.
Plays nicely with ingestion: updates come daily; the index must handle churn.

I also didn’t want relevance tuning to turn into a second product. If I had to spend days tweaking ts_rank weights, that’s a smell.

Options considered

Below are the realistic choices I evaluated for Postgres/Supabase.

Option	What it is	Pros	Cons	Best when
`ILIKE '%query%'` + B-tree	Naive substring match	Simple, no extensions	Slow on large tables; can’t use B-tree with leading wildcard	Tiny datasets or admin tools
PostgreSQL Full-Text Search (`tsvector`)	Token-based search using dictionaries	Good for long text; ranking; language support	Weak on typos/partial strings; tuning needed; titles are short	Articles, descriptions, “document” search
`pg_trgm` (trigram similarity)	String similarity via overlapping 3-char chunks	Typo-tolerant; fast with GIN/GiST; great for short fields	Not semantic; can match weirdly; needs threshold tuning	Names, titles, short text, “forgiving” search
External search (Meilisearch/Typesense/Elastic)	Dedicated search engine	Great relevance; typo handling; faceting	Extra infra; sync complexity; more moving parts; cost	High scale, complex ranking, multi-field relevance

Why I didn’t stick with `ILIKE`

ILIKE feels tempting early on, especially when you’re “vibe coding” fast. But it collapses once you hit a few thousand rows and mix it with filters.

ILIKE '%pmhnp%' forces a scan unless you add specialized indexing. On 7k jobs it might still feel okay—until you add multi-tenant rules, joins to companies, and a few concurrent users.

Why full-text search wasn’t the right default

FTS shines when you search bodies of text. But for job boards, most searches are:

"pmhnp"
"psychiatric"
"remote"
"headway" (company)

FTS tokenization can hurt you here:

Typos don’t match.
Partial tokens don’t match unless you add prefix operators and accept recall/precision trade-offs.
Acronyms and short tokens can behave weirdly depending on dictionaries.

Why I didn’t jump to an external search engine

I love dedicated search engines—but operating them is a commitment:

You need a sync pipeline (DB → search index) that is correct under retries.
You now have two sources of truth for availability.
You have to decide how search respects RLS / access control.

For PMHNP Hiring, the cost and complexity weren’t justified. Postgres could do “good enough” search with less risk.

The decision: pg_trgm + a search vector column I can index

I chose PostgreSQL’s pg_trgm extension and built search around a single normalized field (title + company + location) that I could index with GIN.

Primary reasons (ranked):

Typo tolerance on short fields without building relevance infrastructure.
Composable performance with filters (state, remote, posted_at).
Operational simplicity: no extra services; works inside Supabase.
Predictable indexing story: GIN trigram indexes are battle-tested.

What I gave up:

No semantic relevance (synonyms, intent, “psych NP” == “PMHNP”).
Similarity search can return “surprising” matches unless you tune thresholds.
Ranking is simpler; you’re not doing sophisticated scoring.

Implementation overview

I model jobs and companies relationally, but for search I avoid doing multiple similarity checks across joins at query time. Instead, I denormalize a search_text field on jobs.

1) Enable `pg_trgm` and add an indexed field

-- One-time
create extension if not exists pg_trgm;

-- Add a denormalized search field
alter table jobs add column if not exists search_text text;

-- Keep it simple: lowercased, whitespace-normalized text
-- (I populate this during ingestion / upserts)

-- GIN index for fast trigram search
create index if not exists jobs_search_text_trgm
on jobs using gin (search_text gin_trgm_ops);

Why a denormalized search_text?

Similarity across multiple columns (title, company_name) can prevent index use or force multiple index scans.
Joining companies for every search adds overhead; my feed queries already join for display.
With 200 daily updates, recomputing search_text is cheap and keeps reads fast.

2) Populate `search_text` during upsert (pipeline-friendly)

My ingestion pipeline is: Cron (Vercel) → scraper → dedupe → upsert into Postgres. During the upsert, I compute search_text.

// pseudo-code inside the ingestion worker
const normalize = (s: string) =>
  s
    .toLowerCase()
    .replace(/[^a-z0-9\s]/g, " ")
    .replace(/\s+/g, " ")
    .trim();

const searchText = normalize([
  job.title,
  job.companyName,
  job.locationText,
  job.remote ? "remote" : "",
].filter(Boolean).join(" "));

await db.from("jobs").upsert({
  id: job.id,
  title: job.title,
  company_id: job.companyId,
  location_text: job.locationText,
  remote: job.remote,
  posted_at: job.postedAt,
  search_text: searchText,
  source: job.source,
});

This is a deliberate trade: write-time work for read-time speed.

3) Query pattern: similarity + filters + stable pagination

I treat search as “filtering” rather than a separate endpoint. Most users search while also filtering by state/remote.

-- Parameterized query idea
-- :q is the normalized query string
-- :min_sim is a tuned threshold (e.g., 0.2 to 0.35)

select
  j.id, j.title, j.posted_at, j.remote,
  c.name as company_name
from jobs j
join companies c on c.id = j.company_id
where
  (:state is null or j.state = :state)
  and (:remote is null or j.remote = :remote)
  and (:q = '' or j.search_text % :q)
order by
  case when :q = '' then 0 else similarity(j.search_text, :q) end desc,
  j.posted_at desc,
  j.id desc
limit :limit;

Notes:

The % operator is trigram “similarity match” (uses the trigram index).
similarity() is used only for ordering when a query exists.
The secondary ordering by posted_at, id keeps results stable.

4) Tuning similarity threshold without guesswork

The biggest footgun with pg_trgm is threshold tuning. Too low: irrelevant matches. Too high: you miss typos.

In Postgres you can set it per session:

-- Example: bump threshold for stricter matches
select set_limit(0.28);

-- Then run the search
select id, title
from jobs
where search_text % 'pmhnp remote';

In practice, I ended up using a slightly lower threshold for shorter queries and a higher one for longer queries (because long queries naturally have more trigrams).

Results & learnings (with real numbers)

After shipping pg_trgm search + the right supporting indexes (composite indexes for filters, plus connection pooling with pgBouncer), the database layer for the most common “feed + search” requests stabilized around:

~50ms p95 query time for typical filtered listing queries (state/remote + optional search).
Search remained fast even with daily churn (~200 updates/day) because GIN index maintenance overhead at this scale is manageable.

What worked well:

Typos stopped mattering for the most common cases (company names, “psychiatric”, “practitioner”).
I didn’t need to invent a ranking model. Similarity + recency was “good enough”.
Keeping search in Postgres meant fewer moving parts and fewer failure modes.

Unexpected challenges:

Certain short queries (like "np") matched too broadly. The fix wasn’t more indexing—it was product constraints (minimum query length, or requiring at least one non-trivial token).
Similarity ordering can be noisy when many rows are “close enough”. Recency as a tie-breaker helped.

What I’d do differently:

Add a lightweight synonym layer (application-side) for domain terms (e.g., map “psych np” → “pmhnp psychiatric”). This is cheaper than building semantic search and improves relevance a lot.

When this doesn’t work

pg_trgm isn’t a universal answer. I’d pick something else if:

You need semantic relevance: synonyms, intent understanding, “director of nursing” matching “DON”, etc. That’s where a dedicated search engine or embeddings start to win.
You’re searching long bodies of text (job descriptions). FTS (or hybrid FTS + trgm) is often better.
Your dataset is massive and high-churn (hundreds of millions of rows). GIN index size and maintenance can become expensive.
You need advanced faceting + ranking beyond what SQL can comfortably express.

A pragmatic hybrid that I’d consider later:

FTS for descriptions (token relevance)
pg_trgm for titles/company (typo tolerance)
Merge/rank results in SQL or application layer

But I wouldn’t start there unless search quality is the core differentiator.

Key takeaways

Match the search tool to the shape of your data: short strings behave differently than documents.
Optimize for the operational budget you actually have: one Postgres instance you understand beats two systems you barely monitor.
Denormalize intentionally when it removes joins from your hot path; pay the cost at ingestion time.
Thresholds are product decisions: minimum query length and similarity limits are UX levers, not just database knobs.
Use recency as a stabilizer: in job boards, “newer” is often a better tie-breaker than perfect relevance.

Closing

If you’ve built search for a marketplace or job board: did you stick with Postgres (FTS/trgm) or graduate to a dedicated search engine? I’m especially curious where your tipping point was—data size, relevance requirements, or team/ops maturity.

Why I Use pg_trgm Fuzzy Search Instead of Full-Text Search

Context: search in a scraped job board is messy by default

The problem space: what I actually needed from search

Options considered

Why I didn’t stick with `ILIKE`

Why full-text search wasn’t the right default

Why I didn’t jump to an external search engine

The decision: pg_trgm + a search vector column I can index

Implementation overview

1) Enable `pg_trgm` and add an indexed field

2) Populate `search_text` during upsert (pipeline-friendly)

4) Tuning similarity threshold without guesswork

Results & learnings (with real numbers)

When this doesn’t work

Key takeaways

Closing

Comments

More from this blog

Why I Use SQLite Savepoints for Offline Workout Logging

Why I Use Partial Indexes for “Active Jobs” in Postgres

How We Turn a 35% BLS PMHNP Growth Projection Into Search, Alerts, and Better Job Matches

How We Counted 693 Live PMHNP Openings in California (and Why Volume ≠ Fit)

How We Detect “PMHNP-BC Required” in 500+ Job Feeds (and What the Credential Actually Means)

Command Palette

Context: search in a scraped job board is messy by default

The problem space: what I actually needed from search

Options considered

Why I didn’t stick with ILIKE

Why full-text search wasn’t the right default

Why I didn’t jump to an external search engine

The decision: pg_trgm + a search vector column I can index

Implementation overview

1) Enable pg_trgm and add an indexed field

2) Populate search_text during upsert (pipeline-friendly)

3) Query pattern: similarity + filters + stable pagination

4) Tuning similarity threshold without guesswork

Results & learnings (with real numbers)

When this doesn’t work

Key takeaways

Closing

Comments

More from this blog

Why I didn’t stick with `ILIKE`

1) Enable `pg_trgm` and add an indexed field

2) Populate `search_text` during upsert (pipeline-friendly)