Why I Built a Durable Offline Queue for AI Calls in React Native

AI features are easy to demo on perfect Wi‑Fi and painfully fragile in the real world. In my fitness app project (React Native + Expo + SQLite), users can log a set in ~5 seconds and optionally get AI help (workout suggestions, explanations, quick adjustments). The architectural decision that mattered most wasn’t the prompt design—it was whether AI calls should be “best-effort” or “durable”. I chose a durable, persisted offline queue for OpenAI requests so the UX stays responsive, battery-friendly, and predictable even when the device is offline or rate-limited.

Context: the problem space (and why it’s subtle)

I’m building a mobile workout tracker where the core loop is fast: open app → log set → move on. The app is offline-first: SQLite is the primary store, and sync is “cloud-backup”, not “cloud-source-of-truth”. Scale is small today (10 waitlist, ~400 exercises), but the constraints are real:

Sub-100ms UI interactions for logging (anything slower feels like friction mid-set)
Offline and spotty network are normal (basements, gyms with bad reception)
Battery and data usage matter (background retry loops can be expensive)
AI calls are non-critical (logging must work without them)
OpenAI limits and latency are unpredictable (429s, timeouts, slow responses)

The naive approach is: “Call the API when the user taps, show a spinner, retry on failure.” That’s fine for a chat app. For a workout logger, it’s a UX regression: it blocks the user on something that isn’t essential.

So the decision: Should AI requests be synchronous and UI-coupled, or should they be durable tasks that can be executed later?

Key insight: In offline-first apps, anything that touches the network should be treated like a background job—especially if it’s optional.

Options considered

I considered four patterns for integrating AI calls without degrading the core logging experience.

Comparison table

Option	What it is	Pros	Cons	Best when
A) Synchronous call in UI flow	Call OpenAI on button tap, await result	Simple mental model; fewer moving parts	UI stalls; brittle offline; retries drain battery; hard to rate-limit	AI is core feature and latency is acceptable
B) Fire-and-forget in memory	Trigger request, don’t await; store result in state when it returns	UI stays fast; minimal code	If app is killed, request is lost; no backoff; duplicates likely	AI is “nice to have” and losing responses is OK
C) Durable local queue (SQLite)	Persist tasks; worker processes when online; backoff + rate limits	Survives restarts; controllable retries; good offline UX; measurable	More code; need idempotency + dedupe; needs observability	Offline-first apps with optional network features
D) Server-side job queue	Send intent to backend; backend calls OpenAI and pushes result	Centralized control; better secrets management; easier analytics	Requires backend; still needs device-side offline handling; more cost/ops	You already run a backend and need shared results

Why I didn’t choose A or B

A (synchronous) made the UI hostage to network conditions. Even if I didn’t block the whole screen, it introduced “pending” states everywhere and created edge cases (user logs next set while previous AI call is still inflight).
B (in-memory) sounded attractive until I simulated real behavior: mobile OS kills the app, users background it, network flips, and you end up with lost work or duplicates.

Why I didn’t choose D (server-side)

Longer term, a backend queue is compelling. But right now the app is offline-first and early-stage. Adding a backend just to make AI reliable felt like premature complexity. Also, I’d still need a device-side outbox because requests originate offline.

That led to C: a durable local queue.

The decision: a persisted offline queue (SQLite outbox)

I implemented an Outbox pattern for AI requests:

Every AI intent becomes a row in ai_jobs in SQLite.
UI writes a job and immediately returns (optimistic UX).
A background worker processes jobs when:
- device is online
- rate limit allows
- app is in foreground (initially; background execution is a later enhancement)
Results are stored back into SQLite and projected into UI state.

Architecture diagram (Mermaid)

Data model: jobs need to be idempotent

The main thing I learned from data engineering is: distributed systems fail in boring ways. Mobile is a distributed system with a very unreliable worker (the phone).

Each job needs:

a stable idempotency key (so retries don’t duplicate effects)
status transitions that are safe across crashes
metadata for backoff and debugging

A minimal schema:

-- SQLite
CREATE TABLE IF NOT EXISTS ai_jobs (
  id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  payload_json TEXT NOT NULL,
  status TEXT NOT NULL, -- queued | running | done | failed
  attempts INTEGER NOT NULL DEFAULT 0,
  run_after_ms INTEGER NOT NULL DEFAULT 0,
  locked_until_ms INTEGER NOT NULL DEFAULT 0,
  last_error TEXT,
  created_at_ms INTEGER NOT NULL,
  updated_at_ms INTEGER NOT NULL
);

CREATE INDEX IF NOT EXISTS idx_ai_jobs_status_run_after
ON ai_jobs(status, run_after_ms);

Enqueue from UI (fast path)

The UI path must be cheap: one insert, no network.

import { nanoid } from "nanoid/non-secure";

type AiJobType = "exercise_suggestion" | "form_explanation";

export async function enqueueAiJob(db: any, type: AiJobType, payload: unknown) {
  const now = Date.now();
  const id = nanoid();

  await db.runAsync(
    `INSERT INTO ai_jobs(id, type, payload_json, status, attempts, run_after_ms, locked_until_ms, created_at_ms, updated_at_ms)
     VALUES(?, ?, ?, 'queued', 0, 0, 0, ?, ?)`
    , [id, type, JSON.stringify(payload), now, now]
  );

  return id;
}

Design choice: I’m not doing anything clever here (no batching, no compression). The win is that it’s deterministic and restart-safe.

Claim + process: avoid duplicate workers

Even on-device, you can end up with multiple workers (hot reload, navigation bugs, accidental multiple intervals). I added a “lease” field locked_until_ms to prevent double-processing.

const LEASE_MS = 15_000;

async function claimNextJob(db: any) {
  const now = Date.now();

  // Find a runnable job
  const job = await db.getFirstAsync(
    `SELECT * FROM ai_jobs
     WHERE status = 'queued'
       AND run_after_ms <= ?
       AND locked_until_ms <= ?
     ORDER BY created_at_ms ASC
     LIMIT 1`,
    [now, now]
  );

  if (!job) return null;

  // Lease it (best-effort atomicity; acceptable for single-device SQLite)
  const lockedUntil = now + LEASE_MS;
  await db.runAsync(
    `UPDATE ai_jobs
     SET status = 'running', locked_until_ms = ?, updated_at_ms = ?
     WHERE id = ?`,
    [lockedUntil, now, job.id]
  );

  return { ...job, locked_until_ms: lockedUntil, status: "running" };
}

This is not perfect distributed locking, but for a single SQLite DB on one phone it’s pragmatic.

Backoff + rate limiting: protect UX and battery

Two failure modes matter:

Offline / flaky network → repeated failures
429 rate limits → hammering the API makes it worse

I used an exponential backoff with jitter, and a simple per-user token bucket stored in memory + persisted timestamp in SQLite (so app restarts don’t immediately retry everything).

function nextRunAfterMs(attempts: number) {
  const base = Math.min(60_000, 1000 * Math.pow(2, attempts)); // cap at 60s
  const jitter = Math.floor(Math.random() * 400); // 0-400ms
  return Date.now() + base + jitter;
}

async function markJobRetry(db: any, id: string, attempts: number, err: unknown) {
  const now = Date.now();
  const runAfter = nextRunAfterMs(attempts);

  await db.runAsync(
    `UPDATE ai_jobs
     SET status='queued', attempts=?, run_after_ms=?, locked_until_ms=0, last_error=?, updated_at_ms=?
     WHERE id=?`,
    [attempts, runAfter, String(err), now, id]
  );
}

Decision detail: I intentionally cap backoff at 60 seconds for now because AI is non-critical, and long retry windows reduce battery churn. If a job can’t succeed within a few minutes, it’s usually because the user is offline for a while—better to wait for connectivity.

Result persistence: decouple UI from worker

When the worker completes, it stores a result row keyed by job id (or domain entity id), and marks the job done.

That means the UI can render “AI suggestion pending” without caring whether the app was restarted.

Results & learnings (so far)

This is early (Builder Day 30), but a few things are measurable even at small scale.

Performance impact

Logging path latency: enqueue insert is typically 1–4ms on my test device (Pixel 7), and the UI remains under my ~100ms interaction budget.
Cold start impact: negligible because the worker doesn’t run until after initial render (I schedule it after navigation is ready). The main startup cost remains loading the exercise library (handled via lazy loading).

Reliability improvements

No more “spinners that never resolve” when the user goes underground.
If the app is killed mid-request, the job lease expires and it retries later.
Rate limiting is centralized, so I can enforce “N AI calls per minute” without sprinkling guards across UI components.

Unexpected challenges

Duplicate intents: users tap twice, or navigate back/forward quickly. Without dedupe, you pay twice. I’m adding a domain-level idempotency key (e.g., suggestion:{workoutSessionId}:{exerciseId}:{setIndex}) to collapse duplicates.
Observability: debugging on-device queues is annoying. I added a hidden “Queue Inspector” screen that lists jobs, attempts, and last_error. Not pretty, but it cuts debugging time.
Context window management: queued jobs can run minutes later. If the payload references “current set”, it may no longer be current. I learned to enqueue immutable references (IDs + snapshot fields), not “current state”.

When this doesn’t work

A durable local queue is not a universal solution.

If AI output must be immediate, like real-time coaching, you’ll still need synchronous calls (or at least streaming). Queueing helps reliability but not latency.
If you require cross-device consistency, device-local jobs can diverge. You’ll want a server-side queue or a shared sync layer where jobs are replicated and deduped across devices.
If you need strong guarantees, SQLite leasing is “good enough” for single-device but not equivalent to a transactional distributed queue. If you later add background tasks, multiple processes, or extensions, you’ll need more robust locking.
If your payloads are huge, storing them in SQLite can bloat DB size and slow queries. In that case, store payloads as separate blobs/files and reference them.

Key takeaways

Treat optional network features as background jobs in offline-first apps; don’t couple them to UI interactions.
Persist the queue (SQLite outbox) so app restarts and OS kills don’t lose work.
Design for idempotency early—duplicates are normal, not an edge case.
Centralize backoff and rate limiting to protect battery, UX, and your API bill.
Enqueue immutable snapshots, not “current state”, because queued work executes later.

Closing

I’m happy with the durability and UX improvements, but I’m still unsure about the “right” next step: background execution (TaskManager) vs moving AI orchestration server-side once multi-device sync becomes real.

If you’ve built offline queues on mobile: do you prefer a device outbox like this, or do you push intents to a backend queue as early as possible—and why?

Why I Built a Durable Offline Queue for AI Calls in React Native

Context: the problem space (and why it’s subtle)

Options considered

Comparison table

Why I didn’t choose A or B

Why I didn’t choose D (server-side)

The decision: a persisted offline queue (SQLite outbox)

Architecture diagram (Mermaid)

Data model: jobs need to be idempotent

Enqueue from UI (fast path)

Claim + process: avoid duplicate workers

Backoff + rate limiting: protect UX and battery

Result persistence: decouple UI from worker

Results & learnings (so far)

Performance impact

Reliability improvements

Unexpected challenges

When this doesn’t work

Key takeaways

Closing

Comments

More from this blog

Why I Use SQLite Savepoints for Offline Workout Logging

Why I Use Partial Indexes for “Active Jobs” in Postgres

How We Turn a 35% BLS PMHNP Growth Projection Into Search, Alerts, and Better Job Matches

How We Counted 693 Live PMHNP Openings in California (and Why Volume ≠ Fit)

How We Detect “PMHNP-BC Required” in 500+ Job Feeds (and What the Credential Actually Means)

Command Palette

Context: the problem space (and why it’s subtle)

Options considered

Comparison table

Why I didn’t choose A or B

Why I didn’t choose D (server-side)

The decision: a persisted offline queue (SQLite outbox)

Architecture diagram (Mermaid)

Data model: jobs need to be idempotent

Enqueue from UI (fast path)

Claim + process: avoid duplicate workers

Backoff + rate limiting: protect UX and battery

Result persistence: decouple UI from worker

Results & learnings (so far)

Performance impact

Reliability improvements

Unexpected challenges

When this doesn’t work

Key takeaways

Closing

Comments

More from this blog