Family App

The AI assistant my own household runs on. No public users, no demo data. Real memory, real failover, real observability, because it has to actually work every day.

Family App product screenshot

Why this stack

This had to run on my wife's phone every day, not just in a demo. React Native + Expo was the only realistic path to a real iOS app with native calendar and reminders access (EventKit) and, later, real Siri App Intents, without maintaining a separate native codebase. The backend is a Vercel-deployed Node service using the Vercel AI SDK's tool-calling loop, which meant the same tool definitions worked unchanged across Claude, OpenAI, and xAI when I added provider failover. Supabase covers auth, Postgres, and row-level security in 1 hosted piece instead of 3. None of this was theoretical: it replaced an earlier WhatsApp-bot version of the same assistant that had a habit of dropping operations and going silent during provider outages, and this rebuild exists specifically to close those failure modes.

AI-assist note

Spec-driven across 35 specs, built with Claude Code: I wrote every spec and reviewed every diff, Claude Code wrote most of the implementation. The memory-scoping design, the failover chain, the observability fixes, and the Siri App Intents work below all came out of that spec-plan-implement-review loop.

Stack

  • TypeScript
  • React Native (Expo SDK 52)
  • Swift (App Intents)
  • Node.js
  • Vercel AI SDK
  • Supabase (Postgres + Auth + RLS)
  • Mem0 Platform
  • Claude (+ OpenAI, xAI Grok via failover)

Domains

  • Agentic AI
  • Mobile / Native Integration
  • Systems Reliability
  • Observability
Live2 mo
InfraTypeScript / Node.js on Vercel, React Native + Expo SDK 52, Supabase Postgres (RLS), Mem0 Platform (dual-scope memory)
AuthSupabase Auth (email/password) + per-provider API keys (Anthropic, OpenAI, xAI, Mem0, YNAB, Resend) via Vercel env
Specs35

Why this exists

Every family logistics tool I’d tried treated calendar, reminders, budget, and meal planning as separate apps that didn’t talk to each other. This app is the opposite bet: 1 assistant that reads calendar and reminder data straight off the device, talks to the household budget API, holds a memory layer scoped per person and per household, and reasons over all of it in a single chat interface. It grew out of an earlier WhatsApp-based version of the same idea that worked until it didn’t: a provider outage or a dropped API call would fail silently, and nobody found out until something was visibly missing. This rebuild’s entire engineering story is about closing exactly those silent-failure paths.

Architecture

Family App reliability pipelineDevice and cloud integrations route through a generic resilience wrapper with retry queue and circuit breaker, into a Vercel chat backend, through a 3-tier AI provider failover chain, and out through structured logging and a health-check cron that either reaches the mobile app normally or fires an admin alert on detected failure.EventKitcalendar + remindersMem0dual-scope memoryYNABhousehold budget APISiri / App Intentsvoice commands, deep linkswithResilience()retry queue + circuit breaker, generic per integrationVercel API / Chat BackendNode.js, Vercel AI SDK tool loopClaudeprimary, 25s timeoutOpenAIfallback, 20s timeoutxAI Grokfallback, 10s timeouton failureon failureStructured Log + Health-Check Cronruns every 15 minutes across all householdsfailure detectedMobile App (Expo / React Native)Today · Chat · Planner · ActivityAdmin Push Alertcooldown-gated, no duplicate spamNative iOS Layer6 Siri App Intents · 4 quick actions · deep links

Every external call, device-native or cloud, goes through 1 generic resilience wrapper before it reaches anything else: a retry queue with exponential backoff and a per-integration circuit breaker, so a failing integration degrades instead of cascading. The chat backend’s own AI calls get the same treatment through a 3-tier provider chain: Claude first, OpenAI and then xAI Grok only on retriable failures, never on a 400-class bug. A structured log line and a 15-minute health-check cron sit downstream of all of it, watching for the specific failure shape (a technically-successful, functionally-empty response) that used to go unnoticed for days. Most runs end quietly at the mobile app. A detected failure instead produces a cooldown-gated push alert, so the household never has to be the one to notice something broke.

What shipped

35 specs took this from a WhatsApp bot that quietly dropped operations to a mobile app with a memory layer that keeps personal and household facts properly scoped, a 3-provider failover chain with a standing test path to verify it on demand, a generic retry-queue-and-circuit-breaker wrapper covering every external integration, a health-check cron that turns a silent failure into a 15-minute alert, and a native Siri and App Intents layer that finished the deep-link wiring an earlier feature had left half-done. It runs on 1 real household’s calendar, reminders, and budget every day. There’s no public URL for this one; it holds real family data, so the repo and the live instance both stay private. What’s shown here is the engineering, not the household.

Skill stories

Click a skill to open the story behind it: the decision, what broke, how it got measured, and how it got fixed.

  1. Dual-Scope Memory LayerAgentic AI / Memory Systems
  2. Multi-Provider AI FailoverSystems Reliability
  3. Silent-Failure ObservabilityObservability
  4. Native Siri / App Intents IntegrationMobile / Native Integration
  5. Generic Retry Queue + Circuit BreakerSystems Reliability

Agentic AI / Memory Systems

Dual-Scope Memory Layer

Decision
The assistant's memory needed to tell 2 different kinds of facts apart: things true of 1 person and things true of the whole household. I split storage into a user scope (keyed per family member, personal preferences and communication style) and a family scope (keyed per household, shared routines and facts), searched both in parallel per message, and merged the results into the assistant's context.
What broke
Before this, memory was flat. A preference set by 1 family member had no scope boundary, so nothing stopped it from leaking into another member's suggestions, and there was no way to distinguish 'the assistant learned this from a conversation' from 'someone explicitly told it to remember this.'
How I measured it
Tested with 2 members setting conflicting preferences in the same category and confirmed each member's queries only reflected their own scope. Separately, forced a 5-second delay against the memory service to test the 3-second timeout cutoff, and confirmed chat latency stayed within 500ms of the no-memory baseline.
How I fixed it
Added scope-keyed storage (`user:{id}` / `family:{id}`), a timeout wrapper around every memory call so a slow or dead memory service can never delay a chat response, and structured failure logging so a silent memory outage shows up in logs instead of just quietly returning nothing forever.

Systems Reliability

Multi-Provider AI Failover

Decision
Every chat message and every scheduled job's AI call went through a single hardcoded provider. I built an ordered fallback chain instead: Claude primary, then OpenAI, then xAI Grok, each with its own timeout, retrying only on retriable failures (429, 500, 502, 503, timeouts) and passing 400-class errors straight through since those are bugs, not outages.
What broke
The predecessor version of this assistant (a WhatsApp bot) had exactly 1 provider wired in. A provider outage didn't produce an error message, it produced nothing: a scheduled job would silently fail and the household would just never get that day's update.
How I measured it
Simulated a primary-provider outage with a test-mode env flag, sent chat messages through it, and checked structured logs for provider name, which attempts failed, latency per attempt, and a failover flag, across both interactive chat and scheduled jobs.
How I fixed it
Built the 3-tier chain with per-provider timeouts and a standing simulated-outage test path, so failover is something I can verify on demand instead of something I hope works the next time a provider actually goes down.

Observability

Silent-Failure Observability

Decision
An AI call can return HTTP 200 with empty text and no usable tool output, which looks like success to everything downstream. I made that specific shape count as an error instead of a stored success, added it to the existing error tracker, and added a 15-minute cron that scans every household for repeated failures, stuck conversations, open circuit breakers, and disabled background jobs.
What broke
That empty-response shape was the root cause of a failure that ran for days before anyone noticed, because nothing distinguished it from a normal successful turn. It just sat in the conversation history looking fine.
How I measured it
Forced an empty-text, no-tool-result response in a test conversation and confirmed the error tracker caught it and the user saw a clear fallback message instead of blank output. Checked the health endpoint separately, confirming it reports pass/warn/fail per component (database, background jobs, circuit breakers) instead of a single up/down flag.
How I fixed it
Converted the empty-response case from a silent success into a tracked, alertable error, added a cooldown-gated push notification so repeat failures for the same household don't spam duplicate alerts, and added structured per-interaction logging (conversation id, provider, latency, tool-call count, error type) so the next incident is a log search instead of a guess.

Mobile / Native Integration

Native Siri / App Intents Integration

Decision
The app's voice layer started as URL-scheme shortcuts that had to be manually configured. I replaced that with 6 real iOS App Intents, written in Swift and injected through a custom Expo config plugin, that register themselves with Siri and Spotlight automatically on install, plus 4 home-screen quick actions, without hand-maintaining a parallel native project.
What broke
The deep-link listener those intents needed to target was only half-wired, left over from an earlier feature that never finished the navigation-layer plumbing. Voice commands had nowhere reliable to land.
How I measured it
Tested all 6 registered phrases across cold launch, backgrounded, and foregrounded app states, and separately verified the 1 free-form intent ('ask about X') correctly extracts and sends the spoken text, accounting for normal Siri transcription accuracy rather than expecting it to be perfect.
How I fixed it
Finished the deep-link routing so every recognized path lands on the right screen, added a fallback to the default tab for anything unrecognized instead of an error or crash, and confirmed the app degrades gracefully on iOS versions below the App Intents minimum: it just loses the voice layer, deep links and quick actions keep working.

Systems Reliability

Generic Retry Queue + Circuit Breaker

Decision
Every integration (calendar writes, budget updates, memory storage) was handling failure differently, if at all. I built 1 generic `withResilience()` wrapper providing retry-with-exponential-backoff, a durable Supabase-backed retry queue, and a per-integration circuit breaker, so any external call gets the same failure handling without hand-rolling it at each call site.
What broke
The predecessor bot's failure mode for a failed external call was that the operation just disappeared. A dropped calendar write or a dropped budget update had no queue, no retry, and no record that it had ever been attempted.
How I measured it
Forced 5 consecutive failures against 1 integration and confirmed the circuit opened, meaning later calls were skipped immediately instead of each one burning a 25-plus second timeout. Waited out the cooldown window and confirmed the half-open test call closed the circuit again on success.
How I fixed it
Every transient failure now lands in a durable queue with an idempotency key, retries on a backoff schedule, and moves to a dead-letter state with a push notification if it exhausts its attempts. The failure mode changed from silently lost to queued, retried, or explicitly surfaced.